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NOVEL CODING SEQUENCES 
FIELD OF THE INVENTION 

This invention relates to newly identified polynucleotides and polypeptides, and their 
production and uses, as well as their variants, agonists and antagonists, and their uses. In 
particular, in these and in other regards, the invention relates to novel polynucleotides and 
polypeptides set forth in Table 1 . 
BACKGROUND OF THE INVENTION 

The Streptococci make up a medically important genera of microbes known to 
cause several types of disease in humans, including otitis media, pneumonia and meningitis. 
Since its isolation more than 100 years ago, Streptococcus pneumoniae (herein S. 
pneumoniae) has been one of the more intensively studied microbes. For example, much of 
our early understanding that DNA is, in fact, the genetic material was predicated on the 
work of Griffith and of Avery, Macleod and McCarty using this microbe. Despite the vast 
amount of research with S. pneumoniae, many questions concerning the virulence of this 
microbe remain. 

While certain Streptococcal factors associated with pathogenicity have been 
identified, e.g., capsule polysaccharides, peptidoglycans, pneumolysins, PspA Complement 
factor H binding component, autolysin, neuraminidase, peptide permeases, hydrogen 
peroxide, IgAl protease, the list is certainly not complete. Further very little is known 
concerning the temporal expression of such genes during infection and disease progression 
in a mammalian host. Discovering the sets of genes the bacterium is likely to be expressing 
at the different stages of infection, particularly when an infection is established, provides 
critical information for the screening and characterization of novel antibacterials which can 
interrupt pathogenesis. In addition to providing a fuller understanding of known proteins, 
such an approach will identify previously unrecognised targets. 

GUG is used as an initating nucleotide, rather than ATG, for a significant number 
of mRNA's in both Gram positive and Gram negative bacteria. Statistics on the frequency 
of NTG codons in the start codon for several bacterial species are available on line via 
computer at http://biochem.otago.ac.nz:800/Transterm/home_page.html). 

A discussion of initiation codons in B. subtilis is set forth in Vellanoweth, RL.1993 
in Bacillus subtilis and other Gram Positive Bacteria. Biochemistry, Physiology and 
Molecular Genetic s, Sonenshein, Hoch, Losick Eds. Amer. Soc. Microbiol, Washington 
DC. p. 699-711. Vellenworth indicates a major difference between B. subtilis and the 
gram-negative organisms is in the choice of initiation codon. 91% of the sequenced E. coli 
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genes start with AUG. By contrast, about 30% of B. subtilis and other clostridial branch _ 
gened start with UUG or GUG. Moreover, CUG functions as a start codon in B. subtilis. 
Mutations of an AUG initiation codon to GUG or UUG often cause decreased expression in 
B. subtilis and E. coli. Generally, translation efficiency is higher with AUG initiation 
codons. A strong Shine-Delgarno ribosome binding site, however, can compensate almost 
fully for a weak initiation codon. It has been reported that genes with a range of expression 
levels have initiation codons other than ATG in gram positives (Vellanoweth, RL.1993 in 
Bacillus subtilis and other Gram Positive Bacteria, Biochemistry, Physiology and 
Molecular Genetic s, Sonenshein, Hoch, Losick Eds. Amer. Soc. Microbiol, Washington 
DC. p. 699-711). 

Provided herein are ORF sequences from genes possessing GUG initiation codons 
and proteins expressed therefrom and homologues thereto to be used for screening for 
antimicrobial compounds. Clearly, there is a need for polypeptide and polynucleotide 
sequences that may be used to screen for antimicrobial compound and which may also be used to 
determine the roles of such sequences in pathogenesis of infection, dysfunction and disease. 
There is also need, therefore, for identification and characterization of such sequences which may 
play a role in preventing, ameliorating or correcting infections, dysfunctions or diseases. 

The polypeptides of the invention have amino acid sequence homology to a known 
protein(s) as set forth in Table 1 . 
SUMMARY OF THE INVENTION 

It is an object of the invention to provide polypeptides that have been identified as novel 
polypeptides by homology between an amino acid sequence selected from the group consisting 
of the sequences set out in Table 1 and a known amino acid sequence or sequences of other 
proteins such as the protein identities listed in Table 1 . 

It is a further object of the invention to provide polynucleotides that encode novel 
polypeptides, particularly polynucleotides that encode polypeptides of Streptococcus 
pneumoniae. 

In a particularly preferred embodiment of the invention the polynucleotide comprises a 
region encoding a polypeptide comprising a sequence sequence selected from the group 
consisting of the sequences set out in Table 1, or a variant of any of these sequences. 

In another particularly preferred embodiment of the invention there is a novel 
protein from Streptococcus pneumoniae comprising an amino acid sequence selected from the 
group consisting of the sequences set out in Table 1, or a variant of any of these sequences. 



2 



WO 98/19689 



PCT/US97/19226 



In accordance with another aspect of the invention there is provided an isolated nucleic^, 
acid molecule encoding a mature polypeptide expressible by the Streptococcus pneumoniae 
0100993 strain contained in the deposited strain. 

A further aspect of the invention there are provided isolated nucleic acid molecules 
encoding a polypeptide of the invention, particularly Streptococcus pneumoniae polypeptide, and 
including mRNAs, cDNAs, genomic DNAs. Further embodiments of the invention include 
biologically, diagnostically, prophylactically, clinically or therapeutically useful variants thereof, 
and compositions comprising the same. 

In accordance with another aspect of the invention, there is provided the use of a 
polynucleotide of the invention for therapeutic or prophylactic purposes, in particular 
genetic immunization. Among the particularly preferred embodiments of the invention are 
naturally occurring allelic variants of a polypeptide of the invention and polypeptides encoded 
thereby. 

Another aspect of the invention there are provided novel polypeptides of Streptococcus 
pneumoniae as well as biologically, diagnostically, prophylactically, clinically or therapeutically 
useful variants thereof, and compositions comprising the same. 

Among the particularly preferred embodiments of the invention are variants of the 
polypeptides of the invention encoded by naturally occurring alleles of their genes. 

In a preferred embodiment of the invention there are provided methods for producing the 
aforementioned polypeptides. 

In accordance with yet another aspect of the invention, there are provided inhibitors 
to such polypeptides, useful as antibacterial agents, including, for example, antibodies. 

In accordance with certain preferred embodiments of the invention, there are provided 
products, compositions and methods for assessing expression of the polypeptides and 
polynucleotides of the invention, treating disease, for example, including, for example, otitis 
media, conjunctivitis, pneumonia, bacteremia, meningitis, sinusitis, pleural empyema and 
endocarditis, and most particularly meningitis, such as for example infection of cerebrospinal 
fluid, assaying genetic variation, and administering a polypeptide or polynucleotide of the 
invention to an organism to raise an immunological response against a bacteria, especially a 
Streptococcus pneumoniae bacteria. 

In accordance with certain preferred embodiments of this and other aspects of the 
invention there are provided polynucleotides that hybridize to a polynucleotide sequence of the 
invention, particularly under stringent conditions. 
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In certain preferred embodiments of the invention there are provided antibodies against 
polypeptides of the invention. 

In other embodiments of the invention there are provided methods for identifying 
compounds which bind to or otherwise interact with and inhibit or activate an activity of a 
polypeptide or polynucleotide of the invention comprising: contacting a polypeptide or 
polynucleotide of the invention with a compound to be screened under conditions to permit 
binding to or other interaction between the compound and the polypeptide or polynucleotide to 
assess the binding to or other interaction with the compound, such binding or interaction being 
associated with a second component capable of providing a detectable signal in response to the 
binding or interaction of the polypeptide or polynucleotide with the compound; and determining 
whether the compound binds to or otherwise interacts with and activates or inhibits an activity of 
the polypeptide or polynucleotide by detecting the presence or absence of a signal generated from 
the binding or interaction of the compound with the polypeptide or polynucleotide. 

In accordance with yet another aspect of the invention, there are provided agonists and 
antagonists of the polypeptides and polynucleotides of the invention, preferably bacteriostatic or 
bacteriocidal agonists and antagonists. 

In a further aspect of the invention there are provided compositions comprising a 
polynucleotide or a polypeptide of the invention for administration to a cell or to a multicellular 
organism. 

Various changes and modifications within the spirit and scope of the disclosed invention 
will become readily apparent to those skilled in the art from reading the following descriptions 
and from reading the other parts of the present disclosure. 
GLOSSARY 

The following definitions tire provided to facilitate understanding of certain terms used 
frequently herein. 

"Disease(s) means any bacterial infection, but preferably a streptococcal infection, such 
as, otitis media, conjunctivitis, pneumonia, bacteremia, meningitis, sinusitis, pleural empyema, 
endocarditis, meningitis, and infection of cerebrospinal fluid. 

"Host cell" is a cell which has been transformed or transfected, or is capable of 
transformation or transfection by an exogenous polynucleotide sequence. 

"Identity," as known in the art, is a relationship between two or more polypeptide 
sequences or two or more polynucleotide sequences, as determined by comparing the sequences. 
In the art, "identity" also means the degree of sequence relatedness between polypeptide or 
polynucleotide sequences, as the case may be, as determined by the match between strings 
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of such sequences. "Identity" and "similarity" can be readily calculated by known methods, 
including but not limited to those described in {Computational Molecular Biology, Lesk, 
A.M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and 
Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis 
of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 
1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and 
Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New 
York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math,, 48: 1073 (1988). 
Preferred methods to determine identity are designed to give the largest match between the 
sequences tested. Methods to determine identity and similarity are codified in publicly 
available computer programs. Preferred computer program methods to determine identity 
and similarity between two sequences include, but are not limited to, the GCG program 
package (Devereux, J., et al., Nucleic Acids Research 12(1): 387 (1984)), BLASTP, 
BLASTN, and FASTA (Atschul, S.F. et al., J. Molec. Biol. 215: 403-410 (1990). The 
BLAST X program is publicly available from NCBI and other sources (BLAST Manual, 
Altschul, S., et al, NCBI NLM NIH Bethesda, MD 20894; Altschul, S., et al, J. Mol Biol 
215: 403-410 (1990). As an illustration, by a polynucleotide having a nucleotide sequence 
having at least, for example, 95% "identity" to a reference nucleotide sequence it is 
intended that the nucleotide sequence of the tested polynucleotide is identical to the 
reference sequence except that the polynucleotide sequence may include up to five point 
mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to 
obtain a polynucleotide having a nucleotide sequence at least 95% identical to a reference 
nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted 
or substituted with another nucleotide, or a number of nucleotides up to 5% of the total 
nucleotides in the reference sequence may be inserted into the reference sequence. These 
mutations of the reference sequence may occur at the 5' or 3' terminal positions of the 
reference nucleotide sequence or anywhere between those terminal positions, interspersed 
either individually among nucleotides in the reference sequence or in one or more 
contiguous groups within the reference sequence. Analogously , by a polypeptide having 
an amino acid sequence having at least, for example, 95% identity to a reference amino acid 
sequence is intended that the test amino acid sequence of the polypeptide is identical to the 
reference sequence except that the polypeptide sequence may include up to five amino acid 
alterations per each 100 amino acids of the reference amino acid. In other words, to obtain 
a polypeptide having an amino acid sequence at least 95% identical to a reference amino 
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acid sequence, up to 5% of the amino acid residues in the reference sequence may be _ 
deleted or substituted with another amino acid, or a number of amino acids up to 5% of the 
total amino acid residues in the reference sequence may be inserted into the reference 
sequence. These alterations of the reference sequence may occur at the amino or carboxy 
terminal positions of the reference amino acid sequence or anywhere between those terminal 
positions, interspersed either individually among residues in the reference sequence or in 
one or more contiguous groups within the reference sequence. 

"Isolated" means altered "by the hand of man" from its natural state, i.e., if it occurs in 
nature, it has been changed or removed from its original environment, or both. For example, a 
polynucleotide or a polypeptide naturally present in a living organism is not "isolated," but the 
same polynucleotide or polypeptide separated from the coexisting materials of its natural state is 
"isolated", as the term is employed herein. 

"Polynucleotide(s)" generally refers to any polyribonucleotide or 
polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. 
"Polynucleotide(s)" include, without limitation, single- and double-stranded DNA, DNA that is a 
mixture of single- and double-stranded regions or single-, double- and triple- stranded regions, 
single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded 
regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more 
typically, double-stranded, or triple-stranded regions, or a mixture of single- and double- stranded 
regions. In addition, "polynucleotide" as used herein refers to triple-stranded regions comprising 
RNA or DNA or both RNA and DNA. The strands in such regions may be from the same 
molecule or from different molecules. The regions may include all of one or more of the 
molecules, but more typically involve only a region of some of the molecules. One of the 
molecules of a triple-helical region often is an oligonucleotide. As used herein, the term 
"polynucleotide(s)" also includes DNAs or RNAs as described above that contain one or more 
modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other 
reasons are "polynucleotide(s)" as that term is intended herein. Moreover, DNAs or RNAs 
comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name 
just two examples, are polynucleotides as the term is used herein. It will be appreciated that a 
great variety of modifications have been made to DNA and RNA that serve many useful 
purposes known to those of skill in the art. The term "polynucleotide(s)" as it is employed herein 
embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as 
well as the chemical forms of DNA and RNA characteristic of viruses and cells, including, for 
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example, simple and complex cells. "Polynucleotide(s)" also embraces short polynucleotides _ 
often referred to as oligonucleotide(s). 

"Polypeptide(s)" refers to any peptide or protein comprising two or more amino acids 
joined to each other by peptide bonds or modified peptide bonds. "Polypeptide(s)" refers to both 
short chains, commonly referred to as peptides, oligopeptides and oligomers and to longer chains 
generally referred to as proteins. Polypeptides may contain amino acids other than the 20 gene 
encoded amino acids. "Polypeptide(s)" include those modified either by natural processes, such 
as processing and other post-transiational modifications, but also by chemical modification 
techniques. Such modifications are well described in basic texts and in more detailed 
monographs, as well as in a voluminous research literature, and they are well known to those of 
skill in the art. It will be appreciated that the same type of modification may be present in the 
same or varying degree at several sites in a given polypeptide. Also, a given polypeptide may 
contain many types of modifications. Modifications can occur anywhere in a polypeptide, 
including the peptide backbone, the amino acid side-chains, and the amino or carboxyl termini. 
Modifications include, for example, acetylation, acylation, ADP-ribosylation, amidation, 
covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a 
nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent 
attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, 
demethylation, formation of covalent cross-links, formation of cysteine, formation of 
pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, 
hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic processing, 
phosphorylation, prenylation, racemization, glycosylation, lipid attachment, sulfation, gamma- 
carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation, selenoylation, 
sulfation, transfer-RNA mediated addition of amino acids to proteins, such as arginylation, and 
ubiquitination. See, for instance, PROTEINS - STRUCTURE AND MOLECULAR 
PROPERTIES, 2nd Ed., T. E. Creighton, W. H. Freeman and Company, New York (1993) and 
Wold, F., Posttranslational Protein Modifications: Perspectives and Prospects, pgs. 1-12 in 
POSTTRANSLA TIONAL COVALENT MODIFICATION OF PROTEINS, B. C. Johnson, Ed., 
Academic Press, New York (1983); Seifter et ak, Meth. EnzymoL 752:626-646 (1990) and 
Rattan et al., Protein Synthesis: Posttranslational Modifications and Aging, Ann. N.Y. Acad. 
Sci. 663: 48-62 (1992). Polypeptides may be branched or cyclic, with or without branching. 
Cyclic, branched and branched circular polypeptides may result from post-translational natural 
processes and may be made by entirely synthetic methods, as well. 
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"Variant(s)" as the term is used herein, is a polynucleotide or polypeptide that 

differs from a reference polynucleotide or polypeptide respectively, but retains essential 
properties. A typical variant of a polynucleotide differs in nucleotide sequence from 
another, reference polynucleotide. Changes in the nucleotide sequence of the variant may 
or may not alter the amino acid sequence of a polypeptide encoded by the reference 
polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, 
deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as 
discussed below. A typical variant of a polypeptide differs in amino acid sequence from 
another, reference polypeptide. Generally, differences are limited so that the sequences of 
the reference polypeptide and the variant are closely similar overall and, in many regions, 
identical. A variant and reference polypeptide may differ in amino acid sequence by one or 
more substitutions, additions, deletions in any combination. A substituted or inserted 
amino acid residue may or may not be one encoded by the genetic code. A variant of a 
polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it 
may be a variant that is not known to occur naturally. Non-naturally occurring variants of 
polynucleotides and polypeptides may be made by mutagenesis techniques, by direct 
synthesis, and by other recombinant methods known to skilled artisans. 
DESCRIPTION OF THE INVENTION 

Each of polynucleotide and polypeptide sequences provided herein may be used in 
the discovery and development of antibacterial compounds. Upon expression of the 
sequences with the appropriate initiation and termination codons the encoded polypeptide 
can be used as a target for the screening of antimicrobial drugs. Additionally, the DNA 
sequences encoding preferably the amino terminal regions of the encoded protein or the 
Shine-Delgarno region can be used to construct antisense sequences to control the 
expression of the coding sequence of interest. Furthermore, many of the sequences 
disclosed herein also provide regions upstream and downstream from the encoding 
sequence. These sequences are useful as a source of regulatory elements for the control of 
bacterial gene expression. Such sequences are conveniently isolated by restriction enzyme 
action or synthesized chemically and introduced, for example, into promoter identification 
strains. These strains contain a reporter structural gene sequence located downstream from 
a restriction site such that if an active promoter is inserted, the reporter gene will be 
expressed. 

Although each of the sequences may be employed as described above, this 
invention also provides several means for identifying particularly useful target genes. The 
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first of these approaches entails searching appropriate databases for sequence matches in 

related organisms. Thus, if a homologue exists, the Streptococcal-like form of this gene 
would likely play an analogous role. For example, a Streptococcal protein identified as 
homologous to a cell surface protein in another organism would be useful as a vaccine 
candidate. To the extent such homologies have been identified for the sequences disclosed 
herein they are reported along with the encoding sequence. 

Each of the DNA sequences provided herein may be used in the discovery and 
development of antibacterial compounds. Because each of the sequences contains an open 
reading frame (ORF) with an appropriate initiation and termination codons, the encoded 
protein upon expression can be used as a target for the screening of antimicrobial drugs. 
Additionally, the DNA sequences encoding the amino terminal regions of the encoded 
protein can be used to construct antisense sequences to control the expression of the coding 
sequence of interest. Furthermore, many of the sequences disclosed herein also provide 
regions upstream and downstream from the encoding sequence. These sequences are useful 
as a source of regulatory elements for the control of bacterial gene expression. Such 
sequences are conveniently isolated by restriction enzyme action or synthesized chemically 
and introduced, for example, into promoter identification strains. These strains contain a 
reporter structural gene sequence located downstream from a restriction site such that if an 
active promoter is inserted, the reporter gene will be expressed. 

It is believed that bacteria possess a number of ways of regulating gene expression 
levels, especially in subtle degrees, and the interplay between ribosome binding site and 
inititation codon is utilized for this purpose for these genes. It is also believed that such 
genes will be important targets for antimicrobial drug discovery, particularly since 
pathogenesis genes are believed undergo gene expression regulation during in the 
pathogenesis process. Therefore, the invention provides ORF sequences possessing a GTG 
(GUG ) initiation codon and protein targets expressed thereform. 

Although each of the sequences may be employed as described above, this 
invention also provides several means for identifying particularly useful target genes. The 
first of these approaches entails searching appropriate databases for sequence matches in 
related organisms. Thus, if a homologue exists, the Streptococcal-like form of this gene 
would likely play an analogous role. For example, a Streptococcal protein identified as 
homologous to a cell surface protein in another organism would be useful as a vaccine 
candidate. To the extent such homologies have been identified for the sequences disclosed 
herein they are reported along with the encoding sequence. 
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ORF Gene Expression 

Recently techniques have become available to evaluate temporal gene expression in 
bacteria, particularly as it applies to viability under laboratory and infection conditions. A 
number of methods can be used to identify genes which are essential to survival per se, or 
essential to the establishment/maintenance of an infection. Identification of an ORF 
unknown by one of these methods yields additional information about its function and 
permits the selection of such an ORF for further development as a screening target. Briefly, 
these approaches include: 

1) Signature Tagged Mutagenesis (STM): This technique is described by Hensel 
et al., Science 269: 400-403(1995), the contents of which is incorporated by reference for 
background purposes. Signature tagged mutagenesis identifies genes necessary for the 
establishment/maintenance of infection in a given infection model. 

The basis of the technique is the random mutagenesis of target organism by various 
means (e.g., transposons) such that unique DNA sequence tags are inserted in close 
proximity to the site of mutation. The tags from a mixed population of bacterial mutants 
and bacteria recovered from an infected hosts are detected by amplification, radiolabeling 
and hybridisation analysis. Mutants attenuated in virulence are revealed by absence of the 
tag from the pool of bacteria recovered from infected hosts. 

In Streptococcus pneumoniae, because the transposon system is less well 
developed, a more efficient way of creating the tagged mutants is to use the insertion- 
duplication mutagenesis technique as described by Morrison et al., J. Bacteriol. 159:870 
(1984) the contents of which is incorporated by reference for background purposes. 

2) In Vivo Expression Technology (IVET): This technique is described by 
Camilli et al, Proc . Natl. Acad . Sci . USA . 91:2634-2638 (1994), the contents of which is 
incorporated by reference for background purposes. IVET identifies genes up-regulated 
during infection when compared to laboratory cultivation, implying an important role in 
infection. ORF identified by this technique are implied to have a significant role in 
infection establishment/maintenance. 

In this technique random chromosomal fragments of target organism are cloned 
upstream of a promoter-less recombinase gene in a plasmid vector. This construct is 
introduced into the target organism which carries an antibiotic resistance gene flanked by 
resolvase sites. Growth in the presence of the antibiotic removes from the population those 
fragments cloned into the plasmid vector capable of supporting transcription of the 
recombinase gene and therefore have caused loss of antibiotic resistance. The resistant pool 
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is introduced into a host and at various times after infection bacteria may be recovered and 

assessed for the presence of antibiotic resistance. The chromosomal fragment carried by 
each antibiotic sensitive bacterium should carry a promoter or portion of a gene normally 
upregulated during infection. Sequencing upstream of the recombinase gene allows 
identification of the up regulated gene. 

3) Differential display: This technique is described by Chuang et ah, J. 
Bacteriol . 175:2026-2036 (1993), the contents of which is incorporated by reference for 
background purposes. This method identifies those genes which are expressed in an 
organism by identifying mRNA present using randomly-primed RT-PCR. By comparing 
pre-infection and post infection profiles, genes up and down regulated during infection can 
be identified and the RT-PCR product sequenced and matched to ORF 'unknowns'. 

4) Generation of conditional lethal mutants by transposon mutagenesis: 
This technique, described by de Lorenzo, V. et aj., Gene 123:17-24 (1993); Ncuwald, 
A. F. et aL, Gene 125: 69-73(1993); and Takiff, H. E. et aL, J. Bacteriol . 174:1544- 
1553(1992), the contents of which is incorporated by reference for background 
purposes, identifies genes whose expression are essential for cell viability. 

In this technique transposons carrying controllable promoters, which provide 
transcription outward from the transposon in one or both directions, are generated. Random 
insertion of these transposons into target organisms and subsequent isolation of insertion 
mutants in the presence of inducer of promoter activity ensures that insertions which 
separate promoter from coding region of a gene whose expression is essential for cell 
viability will be recovered. Subsequent replica plating in the absence of inducer identifies 
such insertions, since they fail to survive. Sequencing of the flanking regions of the 
transposon allows identification of site of insertion and identification of the gene disrupted. 
Close monitoring of the changes in cellular processes/morphology during growth in the 
absence of inducer yields information on likely function of the gene. Such monitoring 
could include flow cytometry (cell division, lysis, redox potential, DNA replication), 
incorporation of radiochemically labeled precursors into DNA, RNA, protein, lipid, 
peptidoglycan, monitoring reporter enzyme gene fusions which respond to known cellular 
stresses. 

5) Generation of conditional lethal mutants by chemical mutagenesis: This 
technique is described by Beckwith, J., Methods in Enzvmology 204: 

3-18(1991), the contents of which are incorporated herein by reference for background 
purposes. In this technique random chemical mutagenesis of target organism, growth at 
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temperature other than physiological temperature (permissive temperature) and subsequent 
replica plating and growth at different temperature (e.g. 42°C to identify ts, 25°C to identify 
cs) are used to identify those isolates which now fail to grow (conditional mutants). As 
above close monitoring of the changes upon growth at the non-permissive temperature 
yields information on the function of the mutated gene. Complementation of conditional 
lethal mutation by library from target organism and sequencing of complementing gene 
allows matching with unknown ORF. 

6) RT-PCR: Streptococcus pneumoniae messenger RNA is isolated from bacterial 
infected tissue e.g. 48 hour murine lung infections, and the amount of each mRNA species 
assessed by reverse transcription of the RNA sample primed with random hexanucleotides 
followed by PCR with gene specific primer pairs. The determination of the presence and 
amount of a particular mRNA species by quantification of the resultant PCR product 
provides information on the bacterial genes which are transcribed in the infected tissue. 
Analysis of gene transcription can be carried out at different times of infection to gain a 
detailed knowledge of gene regulation in bacterial pathogenesis allowing for a clearer 
understanding of which gene products represent targets for screens for novel antibacterials. 
Because of the gene specific nature of the PCR primers employed it should be understood 
that the bacterial mRNA preparation need not be free of mammalian RNA. This allows the 
investigator to carry out a simple and quick RNA preparation from infected tissue to 
obtain bacterial mRNA species which are very short lived in the bacterium (in the order of 2 
minute halflives). Optimally the bacterial mRNA is prepared from infected murine lung 
tissue by mechanical disruption in the presence of TRIzole (GIBCO-BRL) for very short 
periods of time, subsequent processing according to the manufacturers of TRIzole reagent 
and DNAase treatment to remove contaminating DNA. Preferably the process is optimised 
by finding those conditions which give a maximum amount of Streptococcus pneumoniae 
16S ribosomal RNA as detected by probing Northerns with a suitably labelled sequence 
specific oligonucleotide probe. Typically a 5' dye labelled primer is used in each PCR 
primer pair in a PCR reaction which is terminated optimally between 8 and 25 cycles. The 
PCR products are separated on 6% polyacrylamide gels with detection and quantification 
using GeneScanner (manufactured by ABI). 

Each of these techniques may have advantages or disadvantage depending on the 
particular application. The skilled artisan would choose the approach that is the most 
relevant with the particular end use in mind. 



12 



WO 98/19689 



PCT/US97/19226 



Use of the of these technologies when applied to the ORFs of the present invention _ 
enables identification of bacterial proteins expressed during infection, inhibitors of which 
would have utility in anti-bacterial therapy. 

The invention relates to novel polypeptides and polynucleotides as described in greater 
detail below. In particular, the invention relates to polypeptides and polynucleotides of 
Streptococcus pneumoniae, which is related by amino acid sequence homology to known 
polypeptide as set forth in Table 1 . The invention relates especially to compounds having the 
nucleotide and amino acid sequence selected from the group consisting of the sequences set out 
in Table 1 , and to the nucleotide sequences of the DNA in the deposited strain and amino acid 
sequences encoded thereby. 

Deposited materials 

The deposit has been made under the terms of the Budapest Treaty on the International 
Recognition of the Deposit of Micro-organisms for Purposes of Patent Procedure. The strain 
will be irrevocably and without restriction or condition released to the public upon the issuance 
of a patent. The deposit is provided merely as convenience to those of skill in the art and is not 
an admission that a deposit is required for enablement, such as that required under 35 U.S.C. 
§112. 

A deposit containing a Streptococcus pneumoniae bacterial strain has been deposited 
with the National Collections of Industrial and Marine Bacteria Ltd. (NCIMB), 23 St. 
Machar Drive, Aberdeen AB2 1RY, Scotland on 1 1 April 1996 and assigned NCIMB Deposit 
No. 40794. The Streptococcus pneumoniae bacterial strain deposit is referred to herein as "the 
deposited bacterial strain" or as "the DNA of the deposited bacterial strain." 

The deposited material is a bacterial strain that contains the full length FabH DNA, 
referred to as "NCIMB 40794" upon deposit. 

The sequence of the polynucleotides contained in the deposited material, as well as the 
amino acid sequence of the polypeptide encoded thereby, are controlling in the event of any 
conflict with any description of sequences herein. 

A license may be required to make, use or sell the deposited materials, and no such 
license is hereby granted. 

The deposited strain contains the full length genes comprising the polynucleotides set 
forth in Table 1 . The sequence of the polynucleotides contained in the deposited strain, as well 
as the amino acid sequence of the polypeptide encoded thereby, are controlling in the event of 
any conflict with any description of sequences herein. 

Polypeptides 
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The polypeptides of the invention include the polypeptides set forth in Table 1 (in 

particular the mature polypeptide) as well as polypeptides and fragments, particularly those 
which have the biological activity of a polypeptide of the invention, and also those which have at 
least 50%, 60% or 70% identity to a polypeptide sequence selected from the group consisting of 
the sequences set out in Table 1 or the relevant portion, preferably at least 80% identity to a 
polypeptide sequence selected from the group consisting of the sequences set out in Table 1, and 
more preferably at least 90% similarity (more preferably at least 90% identity) to a polypeptide 
sequence selected from the group consisting of the sequences set out in Table 1, and still more 
preferably at least 95% similarity (still more preferably at least 95% identity) to a polypeptide 
sequence selected from the group consisting of the sequences set out in Table 1 , and also include 
portions of such polypeptides with such portion of the polypeptide generally containing at least 
30 amino acids and more preferably at least 50 amino acids. 

The invention also includes polypeptides of the formula: 

X-(R } ) n -(R 2 HR3)n-Y 

wherein, at the amino terminus, X is hydrogen, and at the carboxyl terminus, Y is hydrogen or a 
metal, R j and R3 are any amino acid residue, n is an integer between 1 and 2000, and R 2 is an 
amino acid sequence of the invention, particularly an amino acid sequence selected from the 
group set forth in Table 1 . In the formula above R2 is oriented so that its amino terminal residue 
is at the left, bound to Rj and its carboxy terminal residue is at the right, bound to R3. Any 
stretch of amino acid residues denoted by either R group, where R is greater than 1 , may be 
either a heteropolymer or a homopolymer, preferably a heteropolymer. In preferred 
embodiments n is an integer between 1 and 1000 or 2000. 

A fragment is a variant polypeptide having an amino acid sequence that entirely is the 
same as part but not all of the amino acid sequence of the aforementioned polypeptides. As with 
polypeptides, fragments may be "free-standing," or comprised within a larger polypeptide of 
which they form a part or region, most preferably as a single continuous region, a single larger 
polypeptide. 

Preferred fragments include, for example, truncation polypeptides having a portion of 
the amino acid sequence of Table 1 , or of variants thereof, such as a continuous series of residues 
that includes the amino terminus, or a continuous series of residues that includes the carboxyl 
terminus. Degradation forms of the polypeptides of the invention in a host cell, particularly a 
Streptococcus pneumoniae, are also preferred. Further preferred are fragments characterized by 
structural or functional attributes such as fragments that comprise alpha-helix and alpha-helix 
forming regions, beta-sheet and beta-sheet-forming regions, turn and turn-forming regions, coil 

14 



WO 98/19689 



PCT/US97/19226 



and coil-forming regions, hydrophilic regions, hydrophobic regions, alpha amphipathic regions, _ 
beta amphipathic regions, flexible regions, surface-forming regions, substrate binding region, and 
high antigenic index regions. 

Also preferred are biologically active fragments which are those fragments that mediate 
activities of polypeptides of the invention, including those with a similar activity or an improved 
activity, or with a decreased undesirable activity. Also included are those fragments that are 
antigenic or immunogenic in an animal, especially in a human. Particularly preferred are 
fragments comprising receptors or domains of enzymes that confer a function essential for 
viability of Streptococcus pneumoniae or the ability to initiate, or maintain cause disease in an 
individual, particularly a human. 

Variants that are fragments of the polypeptides of the invention may be employed for 
producing the corresponding full-length polypeptide by peptide synthesis; therefore, these 
variants may be employed as intermediates for producing the full-length polypeptides of the 
invention. 

In addition to the standard single and triple letter representations for amino acids, 
the term "X" or "Xaa" is also used. "X" and "Xaa" mean that any of the twenty naturally 
occuring amino acids may appear at such a designated position in the polypeptide sequence. 

Polynucleotides 

The nucleotide sequences disclosed herein can be obtained by synthetic chemical 
techniques known in the art or can be obtained from S. pneumoniae 0100993 by probing a 
DNA preparation with probes constructed from the particular sequences disclosed herein. 
Alternatively, oligonucleotides derived from a disclosed sequence can act as PCR primers 
in a process of PCR-based cloning of the sequence from a bacterial genomic source. It is 
recognised that such sequences will also have utility in diagnosis of the stage of infection 
and type of infection the pathogen has attained. 

To obtain the polynucleotide encoding the protein using the DNA sequence given 
herein typically a library of clones of chromosomal DNA of S.pneumoniae 0100993 in E. 
coli or some other suitable host is probed with a radiolabeled oligonucleotide, preferably a 
17mer or longer, derived from the partial sequence. Clones carrying DNA identical to that 
of the probe can then be distinguished using high stringency washes. By sequencing the 
individual clones thus identified with sequencing primers designed from the original 
sequence it is then possible to extend the sequence in both directions to determine the full 
gene sequence. Conveniently such sequencing is performed using denatured double 
stranded DNA prepared from a plasmid clone. Suitable techniques are described by 
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Maniatis, T., Fritsch, E.F. and Sambrook, J. in MOLECULAR CLONING, A Laboratory _ 
Manual, 2nd edition, 1989, Cold Spring Harbor Laboratory (see: Screening By 
Hybridization 1.90 and Sequencing Denatured Double-Stranded DNA Templates 13.70). 

Moerover, another aspect of the invention relates to isolated polynucleotides that encode 
the polypeptides of the invention having a deduced amino acid sequence selected from the group 
consisting of the sequences in Table 1 and polynucleotides closely related thereto and variants 
thereof. 

Using the information provided herein, such as the polynucleotide sequences set out in 
Table 1 , a polynucleotide of the invention encoding polypeptide may be obtained using standard 
cloning and screening methods, such as those for cloning and sequencing chromosomal DNA 
fragments from bacteria using Streptococcus pneumoniae 0100993 cells as starting material, 
followed by obtaining a full length clone. For example, to obtain a polynucleotide sequence of 
the invention, such as a sequence set forth in Table 1, typically a library of clones of 
chromosomal DNA of Streptococcus pneumoniae 0100993 in E.coli or some other suitable 
host is probed with a radiolabeled oligonucleotide, preferably a 17-mer or longer, derived 
from a partial sequence. Clones carrying DNA identical to that of the probe can then be 
distinguished using stringent conditions. By sequencing the individual clones thus 
identified with sequencing primers designed from the original sequence it is then possible to 
extend the sequence in both directions to determine the full gene sequence. Conveniently, 
such sequencing is performed using denatured double stranded DNA prepared from a 
plasmid clone. Suitable techniques are described by Maniatis, T., Fritsch, E.F. and 
Sambrook et aL, MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, New York (1989). (see in particular Screening 
By Hybridization 1.90 and Sequencing Denatured Double-Stranded DNA Templates 
13.70). Illustrative of the invention, the polynucleotides set out in Table 1 were discovered in a 
DNA library derived from Streptococcus pneumoniae 0100993. 

The DNA sequences set out in Table 1 each contains at least one open reading frame 
encoding a protein having at least about the number of amino acid residues set forth in Table 1 . 
The start and stop codons of each open reading frame (herein "ORF") DNA are the first three and 
the last three nuclotides of each polynucleotide set forth in Table 1 . 

Certain polynucleotides and polypeptides of the invention are structurally related to 
known proteins as set forth in Table 1. These proteins exhibit greatest homology to the 
homologue listed in Table 1 from among the known proteins. 
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The invention provides a polynucleotide sequence identical over its entire length to each _ 
coding sequence in Table 1. Also provided by the invention is the coding sequence for the 
mature polypeptide or a fragment thereof, by itself as well as the coding sequence for the mature 
polypeptide or a fragment in reading frame with other coding sequence, such as those encoding a 
leader or secretory sequence, a pre-, or pro- or prepro- protein sequence. The polynucleotide 
may also contain non-coding sequences, including for example, but not limited to non-coding 5' 
and 3' sequences, such as the transcribed, non-translated sequences, termination signals, 
ribosome binding sites, sequences that stabilize mRNA, introns, polyadenylation signals, and 
additional coding sequence which encode additional amino acids. For example, a marker 
sequence that facilitates purification of the fused polypeptide can be encoded. In certain 
embodiments of the invention, the marker sequence is a hexa-histidine peptide, as provided in 
the pQE vector (Qiagen, Inc.) and described in Gentz et ai, Proc. Natl Acad. ScL, USA 86: 821- 
824 (1989), or an HA tag (Wilson et ai, Cell 37: 761 (1984). Polynucleotides of the invention 
also include, but are not limited to, polynucleotides comprising a structural gene and its naturally 
associated sequences that control gene expression. 

The invention also includes polynucleotides of the formula: 

X-CRjVCRsMRsVY 

wherein, at the 5' end of the molecule, X is hydrogen, and at the 3' end of the molecule, Y is 
hydrogen or a metal, Rj and R3 is any nucleic acid residue, n is an integer between 1 and 3000, 
and R2 is a nucleic acid sequence of the invention, particularly a nucleic acid sequence selected 
from the group set forth in Table 1 . In the polynucleotide formula above R2 is oriented so that 
its 5' end residue is at the left, bound to Rj and its 3' end residue is at the right, bound to R3. 
Any stretch of nucleic acid residues denoted by either R group, where R is greater than 1, may be 
either a heteropolymer or a homopolymer, preferably a heteropolymer. In a preferred 
embodiment n is an integer between 1 and 1000, or 2000 or 3000. 

The term "polynucleotide encoding a polypeptide" as used herein encompasses 
polynucleotides that include a sequence encoding a polypeptide of the invention, particularly a 
bacterial polypeptide and more particularly a polypeptide of the Streptococcus pneumoniae 
having an amino acid sequence set out in Table 1 . The term also encompasses polynucleotides 
that include a single continuous region or discontinuous regions encoding the polypeptide (for 
example, interrupted by integrated phage or an insertion sequence or editing) together with 
additional regions, that also may contain coding and/or non-coding sequences. 

The invention further relates to variants of the polynucleotides described herein that 
encode for variants of the polypeptide having the deduced amino acid sequence of Table 1 . 
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Variants that are fragments of the polynucleotides of the invention may be used to synthesize _ 
full-length polynucleotides of the invention. 

Further particularly preferred embodiments are polynucleotides encoding polypeptide 
variants, that have the amino acid sequence of a polypeptide of Table 1 in which several, a few, 5 
to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid residues are substituted, deleted or added, in any 
combination. Especially preferred among these are silent substitutions, additions and deletions, 
that do not alter the properties and activities of such polynucleotide. 

Further preferred embodiments of the invention are polynucleotides that are at least 
50%, 60% or 70% identical over their entire length to a polynucleotide encoding a polypeptide 
having the amino acid sequence set out in Table 1 , and polynucleotides that are complementary 
to such polynucleotides. Alternatively, most highly preferred are polynucleotides that comprise a 
region that is at least 80% identical over its entire length to a polynucleotide encoding a 
polypeptide of the deposited strain and polynucleotides complementary thereto. In this regard, 
polynucleotides at least 90% identical over their entire length to the same are particularly 
preferred, and among these particularly preferred polynucleotides, those with at least 95% are 
especially preferred. Furthermore, those with at least 97% are highly preferred among those with 
at least 95%, and among these those with at least 98% and at least 99% are particularly highly 
preferred, with at least 99% being the more preferred. 

A preferred embodiment is an isolated polynucleotide comprising a polynucleotide 
sequence selected from the group consisting of: a polynucleotide having at least a 50% identity 
to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Table 1 and 
obtained from a prokaryotic species other than S. pneumoniae; and a polynucleotide encoding a 
polypeptide comprising an amino acid sequence which is at least 50% identical to the amino acid 
sequence of Table 1 and obtained from a prokaryotic species other than S. pneumoniae. 

Preferred embodiments are polynucleotides that encode polypeptides that retain 
substantially the same biological function or activity as the mature polypeptide encoded by the 
DNA of Table 1. 

The invention further relates to polynucleotides that hybridize to the herein above- 
described sequences. In this regard, the invention especially relates to polynucleotides that 
hybridize under stringent conditions to the herein above-described polynucleotides. As herein 
used, the terms "stringent conditions" and "stringent hybridization conditions" mean 
hybridization will occur only if there is at least 95% and preferably at least 97% identity between 
the sequences. An example of stringent hybridization conditions is overnight incubation at 
42°C in a solution comprising: 50% formamide, 5x SSC (150mM NaCl, 15mM trisodium 
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citrate), 50 mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10% dextran sulfate,_ 
and 20 micrograms/ml denatured, sheared salmon sperm DNA, followed by washing the 
hybridization support in O.lx SSC at about 65°C. Hybridization and wash conditions are 
well known and exemplified in Sambrook, et aL, Molecular Cloning: A Laboratory Manual, 
Second Edition, Cold Spring Harbor, N.Y., (1989), particularly Chapter 1 1 therein. 

The invention also provides a polynucleotide consisting essentially of a 
polynucleotide sequence obtainable by screening an appropriate library containing the 
complete gene for a polynucleotide sequence set forth in Table 1 under stringent 
hybridization conditions with a probe having the sequence of said polynucleotide sequence 
or a fragment thereof; and isolating said DNA sequence. Fragments useful for obtaining 
such a polynucleotide include, for example, probes and primers described elsewhere herein. 

As discussed additionally herein regarding polynucleotide assays of the invention, for 
instance, polynucleotides of the invention as discussed above, may be used as a hybridization 
probe for RNA, cDNA and genomic DNA to isolate full-length cDNAs and genomic clones 
encoding a polypeptide and to isolate cDNA and genomic clones of other genes that have a high 
sequence similarity to a polynucleotide set forth in Table 1 . Such probes generally will comprise 
at least 15 bases. Preferably, such probes will have at least 30 bases and may have at least 50 
bases. Particularly preferred probes will have at least 30 bases and will have 50 bases or less. 

For example, the coding region of each gene that comprises or is comprised by a 
polynucleotide set forth in Table 1 may be isolated by screening using a DNA sequence provided 
in Table 1 to synthesize an oligonucleotide probe. A labeled oligonucleotide having a sequence 
complementary to that of a gene of the invention is then used to screen a library of cDNA, 
genomic DNA or mRNA to determine which members of the library the probe hybridizes to. 

The polynucleotides and polypeptides of the invention may be employed, for example, 
as research reagents and materials for discovery of treatments of and diagnostics for disease, 
particularly human disease, as further discussed herein relating to polynucleotide assays. 

Polynucleotides of the invention that are oligonucleotides derived from the a 
polynucleotide or polypeptide sequence set forth in Table 1 may be used in the processes 
herein as described, but preferably for PCR, to determine whether or not the 
polynucleotides identified herein in whole or in part are transcribed in bacteria in infected 
tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of 
infection and type of infection the pathogen has attained. 

The invention also provides polynucleotides that may encode a polypeptide that is the 
mature protein plus additional amino or carboxyl-terminal amino acids, or amino acids interior to 
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the mature polypeptide (when the mature form has more than one polypeptide chain, for _ 
instance). Such sequences may play a role in processing of a protein from precursor to a mature 
form, may allow protein transport, may lengthen or shorten protein half-life or may facilitate 
manipulation of a protein for assay or production, among other things. As generally is the case 
in vivo, the additional amino acids may be processed away from the mature protein by cellular 
enzymes. 

A precursor protein, having the mature form of the polypeptide fused to one or more 
prosequences may be an inactive form of the polypeptide. When prosequences are removed such 
inactive precursors generally are activated. Some or all of the prosequences may be removed 
before activation. Generally, such precursors are called proproteins. 

In addition to the standard A, G, C, T/U representations for nucleic acid bases, the 
term "N" is also used. "N" means that any of the four DNA or RNA bases may appear at 
such a designated position in the DNA or RNA sequence, except it is preferred that N is not 
a base that when taken in combination with adjacent nucleotide positions, when read in the 
correct reading frame, would have the effect of generating a premature termination codon in 
such reading frame. 

In sum, a polynucleotide of the invention may encode a mature protein, a mature protein 
plus a leader sequence (which may be referred to as a preprotein), a precursor of a mature protein 
having one or more prosequences that are not the leader sequences of a preprotein, or a 
preproprotein, which is a precursor to a proprotein, having a leader sequence and one or more 
prosequences, which generally are removed during processing steps that produce active and 
mature forms of the polypeptide. 

Vectors, host cells, expression 

The invention also relates to vectors that comprise a polynucleotide or polynucleotides 
of the invention, host cells that are genetically engineered with vectors of the invention and the 
production of polypeptides of the invention by recombinant techniques. Cell-free translation 
systems can also be employed to produce such proteins using RNAs derived from the DNA 
constructs of the invention. 

For recombinant production, host cells can be genetically engineered to incorporate 
expression systems or portions thereof or polynucleotides of the invention. Introduction of a 
polynucleotide into the host cell can be effected by methods described in many standard 
laboratory manuals, such as Davis et al., BASIC METHODS IN MOLECULAR BIOLOGY, 
(1986) and Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), such as, calcium 
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phosphate transfection, DEAE-dextran mediated transfection, transvection, microinjection, _ 
cationic lipid-mediated transfection, electroporation, transduction, scrape loading, ballistic 
introduction and infection. 

Representative examples of appropriate hosts include bacterial cells, such as 
streptococci, staphylococci, enterococci E. coli, streptomyces and Bacillus subtilis cells; fungal 
cells, such as yeast cells and Aspergillus cells; insect cells such as Drosophila S2 and Spodoptera 
Sf9 cells; animal cells such as CHO, COS, HeLa, C127, 3T3, BHK, 293 and Bowes melanoma 
cells; and plant cells. 

A great variety of expression systems can be used to produce the polypeptides of the 
invention. Such vectors include, among others, chromosomal, episomal and virus-derived 
vectors, e.g., vectors derived from bacterial plasmids, from bacteriophage, from transposons, 
from yeast episomes, from insertion elements, from yeast chromosomal elements, from viruses 
such as baculoviruses, papova viruses, such as SV40, vaccinia viruses, adenoviruses, fowl pox 
viruses, pseudorabies viruses and retroviruses, and vectors derived from combinations thereof, 
such as those derived from plasmid and bacteriophage genetic elements, such as cosmids and 
phagemids. The expression system constructs may contain control regions that regulate as well 
as engender expression. Generally, any system or vector suitable to maintain, propagate or 
express polynucleotides and/or to express a polypeptide in a host may be used for expression in 
this regard. The appropriate DNA sequence may be inserted into the expression system by any 
of a variety of well-known and routine techniques, such as, for example, those set forth in 
Sambrook etaL, MOLECULAR CLONING, A LABORATORY MANUAL, (supra). 

For secretion of the translated protein into the lumen of the endoplasmic reticulum, into 
the periplasmic space or into the extracellular environment, appropriate secretion signals may be 
incorporated into the expressed polypeptide. These signals may be endogenous to the 
polypeptide or they may be heterologous signals. 

Polypeptides of the invention can be recovered and purified from recombinant cell 
cultures by well-known methods including ammonium sulfate or ethanol precipitation, acid 
extraction, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography, and lectin chromatography. Most preferably, high performance liquid 
chromatography is employed for purification. Well known techniques for refolding protein may 
be employed to regenerate active conformation when the polypeptide is denatured during 
isolation and or purification. 

Diagnostic Assays 
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This invention is also related to the use of the polynucleotides of the invention for use as _ 
diagnostic reagents. Detection of such polynucleotides in a eukaryote, particularly a mammal, 
and especially a human, will provide a diagnostic method for diagnosis of a disease. Eukaryotes 
(herein also "individual^)"), particularly mammals, and especially humans, infected with an 
organism comprising a gene of the invention may be detected at the nucleic acid level by a 
variety of techniques. 

Nucleic acids for diagnosis may be obtained from an infected individual's cells and 
tissues, such as bone, blood, muscle, cartilage, and skin. Genomic DNA may be used directly for 
detection or may be amplified enzymatically by using PCR or other amplification technique prior 
to analysis. RNA or cDNA may also be used in the same ways. Using amplification, 
characterization of the species and strain of prokaryote present in an individual, may be made by 
an analysis of the genotype of the prokaryote gene. Deletions and insertions can be detected by a 
change in size of the amplified product in comparison to the genotype of a reference sequence. 
Point mutations can be identified by hybridizing amplified DNA to labeled polynucleotide 
sequences of the invention. Perfectly matched sequences can be distinguished from mismatched 
duplexes by RNase digestion or by differences in melting temperatures. DNA sequence 
differences may also be detected by alterations in the electrophoretic mobility of the DNA 
fragments in gels, with or without denaturing agents, or by direct DNA sequencing. See, e.g., 
Myers et a!., Science, 230: 1242 (1985). Sequence changes at specific locations also may be 
revealed by nuclease protection assays, such as RNase and S 1 protection or a chemical cleavage 
method. See, e.g., Cotton et al., Proc. Natl. Acad. ScL, USA, 85: 4397-4401 
(1985). 

Cells carrying mutations or polymorphisms in the gene of the invention may also be 
detected at the DNA level by a variety of techniques, to allow for serotyping, for example. For 
example, RT-PCR can be used to detect mutations. It is particularly preferred to used RT-PCR 
in conjunction with automated detection systems, such as, for example, GeneScan. RNA or 
cDNA may also be used for the same purpose, PCR or RT-PCR. As an example, PCR primers 
complementary to a nucleic acid encoding a polypeptide of the invention can be used to identify 
and analyze mutations. These primers may be used for, among other things, amplifying a DNA 
of the invention isolated from a sample derived from an individual. The primers may be used to 
amplify the gene isolated from an infected individual such that the gene may then be subject to 
various techniques for elucidation of the DNA sequence. In this way, mutations in the DNA 
sequence may be detected and used to diagnose infection and to serotype and/or classify the 
infectious agent. 
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The invention further provides a process for diagnosing disease, preferably bacterial 

infections, more preferably infections by Streptococcus pneumoniae, and most preferably 
disease, comprising determining from a sample derived from an individual a increased level 
of expression of polynucleotide having the sequence of Table 1 . Increased or decreased 
expression of a polynucleotide of the invention can be measured using any on of the 
methods well known in the art for the quantitation of polynucleotides, such as, for example, 
amplification, PCR, RT-PCR, RNase protection, Northern blotting and other hybridization 
methods. 

In addition, a diagnostic assay in accordance with the invention for detecting over- 
expression of a polypeptide of the invention compared to normal control tissue samples may be 
used to detect the presence of an infection, for example. Assay techniques that can be used to 
determine levels of a protein, in a sample derived from a host are well-known to those of skill in 
the art. Such assay methods include radioimmunoassays, competitive-binding assays, Western 
Blot analysis and ELISA assays. 

Antibodies 

The polypeptides of the invention or variants thereof, or cells expressing them can be 
used as an immunogen to produce antibodies immunospecific for such polypeptides. 
"Antibodies" as used herein includes monoclonal and polyclonal antibodies, chimeric, single 
chain, simianized antibodies and humanized antibodies, as well as Fab fragments, including the 
products of an Fab immunoglobulin expression library. 

Antibodies generated against the polypeptides of the invention can be obtained by 
administering the polypeptides or epi tope-bearing fragments, analogues or cells to an animal, 
preferably a nonhuman, using routine protocols. For preparation of monoclonal antibodies, any 
technique known in the art that provides antibodies produced by continuous cell line cultures can 
be used. Examples include various techniques, such as those in Kohler, G. and Milstein, C, 
Nature 256: 495-497 (1975); Kozbor et ai, Immunology Today 4: 72 (1983); Cole et al., pg. 77- 
96 in MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc. (1985). 

Techniques for the production of single chain antibodies (U.S. Patent No. 4,946,778) 
can be adapted to produce single chain antibodies to polypeptides of this invention. Also, 
transgenic mice, or other organisms such as other mammals, may be used to express humanized 
antibodies. 

Alternatively phage display technology may be utilized to select antibody genes 
with binding activities towards the polypeptide either from repertoires of PCR amplified v- 
genes of lymphocytes from humans screened for possessing recognition of a polypeptide of 
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the invention or from naive libraries (McCafferty, J. et al, (1990), Nature 348, 552-554; 

Marks, J. et al., (1992) Biotechnology 10, 779-783). The affinity of these antibodies can 
also be improved by chain shuffling (Clackson, T. et al., (1991) Nature 352, 624-628). 

If two antigen binding domains are present each domain may be directed against a 
different epitope - termed 'bispecific' antibodies. 

The above-described antibodies may be employed to isolate or to identify clones 
expressing the polypeptides to purify the polypeptides by affinity chromatography. 

Thus, among others, antibodies against a polypeptide of the invention may be employed 
to treat disease. 

Polypeptide variants include antigenically, epitopically or immunologically 
equivalent variants that form a particular aspect of this invention. The term "antigenically 
equivalent derivative" as used herein encompasses a polypeptide or its equivalent which 
will be specifically recognized by certain antibodies which, when raised to the protein or 
polypeptide according to the invention, interfere with the immediate physical interaction 
between pathogen and mammalian host. The term "immunologically equivalent derivative" 
as used herein encompasses a peptide or its equivalent which when used in a suitable 
formulation to raise antibodies in a vertebrate, the antibodies act to interfere with the 
immediate physical interaction between pathogen and mammalian host. 

The polypeptide, such as an antigenically or immunologically equivalent derivative 
or a fusion protein thereof is used as an antigen to immunize a mouse or other animal such 
as a rat or chicken. The fusion protein may provide stability to the polypeptide. The 
antigen may be associated, for example by conjugation, with an immunogenic carrier 
protein for example bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH). 
Alternatively a multiple antigenic peptide comprising multiple copies of the protein or 
polypeptide, or an antigenically or immunologically equivalent polypeptide thereof may be 
sufficiently antigenic to improve immunogenicity so as to obviate the use of a carrier. 

Preferably, the antibody or variant thereof is modified to make it less immunogenic 
in the individual. For example, if the individual is human the antibody may most 
preferably be "humanized"; where the complimentarity determining region(s) of the 
hybridoma-derived antibody has been transplanted into a human monoclonal antibody , for 
example as described in Jones, P. et al. (1986), Nature 321, 522-525 or Tempest et 
al.,(1991) Biotechnology 9, 266-273. 

The use of a polynucleotide of the invention in genetic immunization will 
preferably employ a suitable delivery method such as direct injection of plasmid DNA into 
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muscles (Wolff et al., Hum Mol Genet 1992, 1:363, Manthorpe et al., Hum. Gene Ther. _ 
1963:4, 419), delivery of DNA complexed with specific protein carriers (Wu et al, J Biol 
Chem. 1989: 264,16985), coprecipitation of DNA with calcium phosphate (Benvenisty & 
Reshef, PNAS, 1986:83,9551), encapsulation of DNA in various forms of liposomes 
(Kaneda et al., Science 1989:243,375), particle bombardment (Tang et al., Nature 1992, 
356:152, Eisenbraun et al, DNA Cell Biol 1993, 12:791) and in vivo infection using cloned 
retroviral vectors (Seeger et al., PNAS 1984:81,5849). 

Antagonists and agonists - assays and molecules 

Polypeptides of the invention may also be used to assess the binding of small molecule 
substrates and ligands in, for example, cells, cell-free preparations, chemical libraries, and natural 
product mixtures. These substrates and ligands may be natural substrates and ligands or may be 
structural or functional mimetics. See, e.g., Coligan et al, Current Protocols in Immunology 
1(2): Chapter 5 (1991). 

The invention also provides a method of screening compounds to identify those which 
enhance (agonist) or block (antagonist) the action of a polypeptides or polynucleotides of the 
invention, particularly those compounds that are bacteriostatic and/or bacteriocidal. The method 
of screening may involve high-throughput techniques. For example, to screen for agonists or 
antagoists, a synthetic reaction mix, a cellular compartment, such as a membrane, cell envelope 
or cell wall, or a preparation of any thereof, comprising a polypeptide of the invention and a 
labeled substrate or ligand of such polypeptide is incubated in the absence or the presence of a 
candidate molecule that may be an agonist or antagonist of a polypeptide of the invention. The 
ability of the candidate molecule to agonize or antagonize a polypeptide of the invention is 
reflected in decreased binding of the labeled ligand or decreased production of product from such 
substrate. Molecules that bind gratuitously, i.e., without inducing the effects of a polypeptide of 
the invention are most likely to be good antagonists. Molecules that bind well and increase the 
rate of product production from substrate are agonists. Detection of the rate or level of production 
of product from substrate may be enhanced by using a reporter system. Reporter systems that 
may be useful in this regard include but are not limited to colorimetric labeled substrate 
converted into product, a reporter gene that is responsive to changes in polynucleotide or 
polypeptide activity, and binding assays known in the art. 

Another example of an assay for antagonists of polypeptides of the invention is a 
competitive assay that combines any such polypeptide and a potential antagonist with a 
compound which binds such polypeptide, natural substrates or ligands, or substrate or ligand 
mimetics, under appropriate conditions for a competitive inhibition assay. A polypeptide of the 
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invention can be labeled, such as by radioactivity or a colorimetric compound, such that the — 
number of such polypeptide molecules bound to a binding molecule or converted to product can 
be determined accurately to assess the effectiveness of the potential antagonist. 

Potential antagonists include small organic molecules, peptides, polypeptides and 
antibodies that bind to a polynucleotide or polypeptide of the invention and thereby inhibit or 
extinguish its activity. Potential antagonists also may be small organic molecules, a peptide, a 
polypeptide such as a closely related protein or antibody that binds the same sites on a binding 
molecule, such as a binding molecule, without inducing activities induced by a polypeptide of 
the invention, thereby preventing the action of such polypeptide by excluding it from binding. 

Potential antagonists include a small molecule that binds to and occupies the binding 
site of the polypeptide thereby preventing binding to cellular binding molecules, such that 
normal biological activity is prevented. Examples of small molecules include but are not limited 
to small organic molecules, peptides or peptide-like molecules. Other potential antagonists 
include antisense molecules (see Okano, J. Neurochem. 56: 560 (1991); 
OLIGODEOXYNU CLEOTIDES AS ANTISENSE INHIBITORS OF GENE EXPRESSION, CRC 
Press, Boca Raton, FL (1988), for a description of these molecules). Preferred potential 
antagonists include compounds related to and variants of a polypeptide of the invention. 

Each of the DNA sequences provided herein may be used in the discovery and 
development of antibacterial compounds. The encoded protein, upon expression, can be 
used as a target for the screening of antibacterial drugs. Additionally, the DNA sequences 
encoding the amino terminal regions of the encoded protein or Shine-Delgarno or other 
translation facilitating sequences of the respective mRNA can be used to construct antisense 
sequences to control the expression of the coding sequence of interest. 

The invention also provides the use of the polypeptide, polynucleotide or inhibitor 
of the invention to interfere with the initial physical interaction between a pathogen and 
mammalian host responsible for sequelae of infection. In particular the molecules of the 
invention may be used: in the prevention of adhesion of bacteria, in particular gram positive 
bacteria, to mammalian extracellular matrix proteins on in-dwelling devices or to 
extracellular matrix proteins in wounds; to block protein-mediated mammalian cell invasion 
by, for example, initiating phosphorylation of mammalian tyrosine kinases (Rosenshine et 
ai y Infect. Immun. 60:2211 (1992); to block bacterial adhesion between mammalian 
extracellular matrix proteins and bacterial proteins that mediate tissue damage and; to block 
the normal progression of pathogenesis in infections initiated other than by the implantation 
of in-dwelling devices or by other surgical techniques. 
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The antagonists and agonists of the invention may be employed, for instance, to inhibit _ 
and treat disease. 

Helicobacter pylori (herein H. pylori) bacteria infect the stomachs of over one-third 
of the world's population causing stomach cancer, ulcers, and gastritis (International 
Agency for Research on Cancer (1994) Schistosomes, Liver Flukes and Helicobacter Pylori 
(International Agency for Research on Cancer, Lyon, France; 
http://www.uicc.ch/ecp/ecp2904.htm). Moreover, the international Agency for Research on 
Cancer recently recognized a cause-and-effect relationship between H. pylori and gastric 
adenocarcinoma, classifying the bacterium as a Group I (definite) carcinogen. Preferred 
antimicrobial compounds of the invention found using screens provided by the invention, 
particularly broad-spectrum antibiotics, should be useful in the treatment of H. pylori 
infection. Such treatment should decrease the advent of H. pylori- induced cancers, such as 
gastrointestinal carcinoma. Such treatment should also cure gastric ulcers and gastritis. 

Vaccines 

Another aspect of the invention relates to a method for inducing an immunological 
response in an individual, particularly a mammal which comprises inoculating the 
individual with a polypeptide of the invention, or a fragment or variant thereof, adequate to 
produce antibody and/ or T cell immune response to protect said individual from infection, 
particularly bacterial infection and most particularly Streptococcus pneumoniae infection. 
Also provided are methods whereby such immunological response slows bacterial 
replication. Yet another aspect of the invention relates to a method of inducing 
immunological response in an individual which comprises delivering to such individual a 
nucleic acid vector to direct expression of a polynucleotide or polypeptide of the invention, 
or a fragment or a variant thereof, for expressing such polynucleotide or polypeptide, or a 
fragment or a variant thereof in vivo in order to induce an immunological response, such as, 
to produce antibody and/ or T cell immune response, including, for example, cytokine- 
producing T cells or cytotoxic T cells, to protect said individual from disease, whether that 
disease is already established within the individual or not. One way of administering the 
gene is by accelerating it into the desired cells as a coating on particles or otherwise. Such 
nucleic acid vector may comprise DNA, RNA, a modified nucleic acid, or a DNA/RNA 
hybrid. 

A further aspect of the invention relates to an immunological composition which, 
when introduced into an individual capable or having induced within it an immunological 
response, induces an immunological response in such individual to a polynucleotide of the 
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invention or protein coded therefrom, wherein the composition comprises a recombinant _ 
polynucleotide or protein coded therefrom comprising DNA which codes for and expresses 
an antigen of said polynucleotide or protein coded therefrom. The immunological response 
may be used therapeutically or prophylactically and may take the form of antibody 
immunity or cellular immunity such as that arising from CTL or CD4+ T cells. 

A polypeptide of the invention or a fragment thereof may be fused with co-protein 
which may not by itself produce antibodies, but is capable of stabilizing the first protein and 
producing a fused protein which will have immunogenic and protective properties. Thus 
fused recombinant protein, preferably further comprises an antigenic co-protein, such as 
lipoprotein D from Hemophilus influenzae, Glutathione-S-transferase (GST) or beta- 
galactosidase, relatively large co-proteins which solubilize the protein and facilitate 
production and purification thereof. Moreover, the co-protein may act as an adjuvant in the 
sense of providing a generalized stimulation of the immune system. The co-protein may be 
attached to either the amino or carboxy terminus of the first protein. 

Provided by this invention are compositions, particularly vaccine compositions, and 
methods comprising the polypeptides or polynucleotides of the invention and 
immunostimulatory DNA sequences, such as those described in Sato, Y. et al. Science 273: 
352(1996). 

Also, provided by this invention are methods using the described polynucleotide or 
particular fragments thereof which have been shown to encode non-variable regions of 
bacterial cell surface proteins in DNA constructs used in such genetic immunization 
experiments in animal models of infection with Streptococcus pneumoniae will be 
particularly useful for identifying protein epitopes able to provoke a prophylactic or 
therapeutic immune response. It is believed that this approach will allow for the subsequent 
preparation of monoclonal antibodies of particular value from the requisite organ of the 
animal successfully resisting or clearing infection for the development of prophylactic 
agents or therapeutic treatments of bacterial infection, particularly Streptococcus pneumoniae 
infection, in mammals, particularly humans. 

The polypeptide may be used as an antigen for vaccination of a host to produce 
specific antibodies which protect against invasion of bacteria, for example by blocking 
adherence of bacteria to damaged tissue. Examples of tissue damage include wounds in 
skin or connective tissue caused, e.g., by mechanical, chemical or thermal damage or by 
implantation of indwelling devices, or wounds in the mucous membranes, such as the 
mouth, mammary glands, urethra or vagina. 
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The invention also includes a vaccine formulation which comprises an _ 
immunogenic recombinant protein of the invention together with a suitable carrier. Since 
the protein may be broken down in the stomach, it is preferably administered parenterally, 
including, for example, administration that is subcutaneous, intramuscular, intravenous, or 
intradermal. Formulations suitable for parenteral administration include aqueous and non- 
aqueous sterile injection solutions which may contain anti-oxidants, buffers, bacteriostats 
and solutes which render the formulation isotonic with the bodily fluid, preferably the 
blood, of the individual; and aqueous and non-aqueous sterile suspensions which may 
include suspending agents or thickening agents. The formulations may be presented in 
unit-dose or multi-dose containers, for example, sealed ampules and vials and may be 
stored in a freeze-dried condition requiring only the addition of the sterile liquid carrier 
immediately prior to use. The vaccine formulation may also include adjuvant systems for 
enhancing the immunogenicity of the formulation, such as oil-in water systems and other 
systems known in the art. The dosage will depend on the specific activity of the vaccine 
and can be readily determined by routine experimentation. 

While the invention has been described with reference to certain protein, such as, 
for example, those set forth in Table 1, it is to be understood that this covers fragments of 
the naturally occurring protein and similar proteins with additions, deletions or substitutions 
which do not substantially affect the immunogenic properties of the recombinant protein. 

Compositions, kits and administration 

The invention also relates to compositions comprising the polynucleotide or the 
polypeptides discussed above or their agonists or antagonists. The polypeptides of the invention 
may be employed in combination with a non-sterile or sterile carrier or carriers for use with cells, 
tissues or organisms, such as a pharmaceutical carrier suitable for administration to a subject. 
Such compositions comprise, for instance, a media additive or a therapeutically effective amount 
of a polypeptide of the invention and a pharmaceutically acceptable carrier or excipient. Such 
carriers may include, but are not limited to, saline, buffered saline, dextrose, water, glycerol, 
ethanol and combinations thereof. The formulation should suit the mode of administration. The 
invention further relates to diagnostic and pharmaceutical packs and kits comprising one or more 
containers filled with one or more of the ingredients of the aforementioned compositions of the 
invention. 

Polypeptides and other compounds of the invention may be employed alone or in 
conjunction with other compounds, such as therapeutic compounds. 
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The pharmaceutical compositions may be administered in any effective, convenient _ 
manner including, for instance, administration by topical, oral, anal, vaginal, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes among others. 

In therapy or as a prophylactic, the active agent may be administered to an 
individual as an injectable composition, for example as a sterile aqueous dispersion, 
preferably isotonic. 

Alternatively the composition may be formulated for topical application 
for example in the form of ointments, creams, lotions, eye ointments, eye drops, ear drops, 
mouthwash, impregnated dressings and sutures and aerosols, and may contain appropriate 
conventional additives, including, for example, preservatives, solvents to assist drug 
penetration, and emollients in ointments and creams. Such topical formulations may also 
contain compatible conventional carriers, for example cream or ointment bases, and ethanol 
or oleyl alcohol for lotions. Such carriers may constitute from about 1% to about 98% by 
weight of the formulation; more usually they will constitute up to about 80% by weight of 
the formulation. 

For administration to mammals, and particularly humans, it is expected that the 
daily dosage level of the active agent will be from 0.01 mg/kg to 10 mg/kg, typically 
around 1 mg/kg. The physician in any event will determine the actual dosage which will be 
most suitable for an individual and will vary with the age, weight and response of the 
particular individual. The above dosages are exemplary of the average case. There can, of 
course, be individual instances where higher or lower dosage ranges are merited, and such 
are within the scope of this invention. 

In-dwelling devices include surgical implants, prosthetic devices and catheters, i.e., 
devices that are introduced to the body of an individual and remain in position for an 
extended time. Such devices include, for example, artificial joints, heart valves, 
pacemakers, vascular grafts, vascular catheters, cerebrospinal fluid shunts, urinary catheters, 
continuous ambulatory peritoneal dialysis (CAPD) catheters. 

The composition of the invention may be administered by injection to achieve a 
systemic effect against relevant bacteria shortly before insertion of an in-dwelling device. 
Treatment may be continued after surgery during the in-body time of the device. In 
addition, the composition could also be used to broaden perioperative cover for any surgical 
technique to prevent bacterial wound infections, especially Streptococcus pneumoniae 
wound infections. 
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Many orthopedic surgeons consider that humans with prosthetic joints should be _ 
considered for antibiotic prophylaxis before dental treatment that could produce a 
bacteremia. Late deep infection is a serious complication sometimes leading to loss of the 
prosthetic joint and is accompanied by significant morbidity and mortality. It may therefore 
be possible to extend the use of the active agent as a replacement for prophylactic 
antibiotics in this situation. 

In addition to the therapy described above, the compositions of this invention may 
be used generally as a wound treatment agent to prevent adhesion of bacteria to matrix 
proteins exposed in wound tissue and for prophylactic use in dental treatment as an 
alternative to, or in conjunction with, antibiotic prophylaxis. 

Alternatively, the composition of the invention may be used to bathe an indwelling 
device immediately before insertion. The active agent will preferably be present at a 
concentration of ljag/ml to lOmg/ml for bathing of wounds or indwelling devices. 

A vaccine composition is conveniently in injectable form. Conventional adjuvants 
may be employed to enhance the immune response. A suitable unit dose for vaccination is 
0.5-5 microgram/kg of antigen, and such dose is preferably administered 1-3 times and 
with an interval of 1-3 weeks. With the indicated dose range, no adverse toxicological 
effects will be observed with the compounds of the invention which would preclude their 
administration to suitable individuals. 

Each reference disclosed herein is incorporated by reference herein in its entirety. 
Any patent application to which this application claims priority is also incorporated by 
reference herein in its entirety. 
TABLES 

Certain pertinent data for preferred polypeptide and polynucleotide embodiments of 
the invention are summarized in Tables 1 and 2. 

Provided in Table 1 are sequence search results providing characterization 
information regarding certain preferred polynucleotides (denoted as "Assembly") and 
polypeptides of the invention encoded thereby. For each polynucleotide in Table 1, there is 
listed the closest homologue of each polypeptide encoded by each ORF in such 
polynucleotide. This determination of homology is based on a comparison of the sequences 
of in Table 1 with sequences available in the public domain (see heading entitled 
"Description" for the homologue name). Where no significant homologue was detected the 
term "unknown" appears after the heading "Description". Preferred polypeptides encoded 
by the ORFs of the invention, particularly full length proteins either obtained using such 
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ORFs or encoded entirely by such ORFs, are ones that have a biological function of the _ 
homologue listed, among other functions. The analysis used to determine each homologue 
listed in Table 1 was either BlastP and/or BlastX and/or MPSearch, each of which is well 
known. Also provided in Table 1 is the amino acid sequence encoded by each ORF. An 
"Assembly ID" number provides a convenient way to correlate the polynucleotide sequence 
with the ORF or ORFs it comprises and the polypeptides encoded by these ORFs, as well 
as to correlate such sequences with other pertinent information provided in Tables 1 and 2. 
Following the heading "ORF Predictions" the nucleotides at the beginning and end of the 
ORF sequence are set forth ("Start" and "End" respectively). The direction of translation on 
the polynucleotide depicted is denoted by an "F" for forward or an "R" for reverse (reverse 
being translated on the opposite strand from the one depicted). The length of each amino 
acid sequence is also indicated in a column entitled "Length." Below these data is shown 
the amino acid sequence encoded by the ORF. If a given polynucleotide comprises one 
ORF, then in the column entitled "ORF #" there is the numeral one. If it encodes two, there 
are the numerals one and two in the column, and so on. 



TABLE 1 

Assembly ID: 3049156 
Assembly Length: 49 5bp 

> 3049156 Strep Assembly -- Assembly id#3049156 

CTCGGTGATAGAAATAGTGTAATCATGCTTTTCTCTTCTTATCTATACTTTGCTACTTCT 
ATTATACAAAAAAATAAAGCGCTTGACTAGGGATTTTTAGAAAAAAAGCCTATTTTTTCA 
AGAAAAATAGGCTTTTTGCGAACGATTGACACAATTGGATTTGGTTAATTCACTCTTAAC 
GATGGTTTTAAACGATATATATTTTTATATATGTAAATTAAAAACTTCTTTCCTTTCACT 
TCCTACGACTTTTCAGATACAGATAGCCAAAGAAGTTTTCATAGAGGGCAAAAAAGAGGA 
GGAAGGCATGAAGAAAGAAGGTCTCTGGCAAAATCATAATAACAGGATCCTTGGCTGGAT 
CAAAAAGCCAGGTATCATCTCCCACAAAGAGAATTTGATGGAAAAGAGTAAAGAATTGGT 
CAAAACCAATCAAAACTCCCCCAAGTCCATCATCACAGGTAAGACTACTAGAGCCAGGAG 
ACTTTTTCGATAAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 236 385 R 50 aa 

> 3049156-1 ORF translation from 236-385, direction R 
VGDDTWLFDPAKDPVIMILPETFFLHAFLLFFALYENFFGYLYLKSRRK* 
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Description : 
unknown 

Assembly ID: 3049862 
Assembly Length: 52 9bp 

> 3049862 Strep Assembly — Assembly id#3049862 

CTAGAGCAAGTATTTTTCAAACTTTTTCCGAATAAATAGATAGAGCCAGAGAATTTAGTA 
AACCTAGATTTAAAAATGTGCTATAACATAATATATTGAATCTATAATAGTACACCTTGA 
CTGCTAAAATATTTCTATAAATTAATTTGACTTTCCTGATAGAGTTATTCACATCTTATT 
TCAACTCACTATAGAAGGAGGAATAGGAGGATTCTCAGACATCCGGGCATCAGCCCAACT 
AATGATTTGATTGCTAAGAAAATATTCAGCAATCCAGAAATCACTTGTCAATTTATTCGC 
GATATGCTGGACTTGCCAGCAAAAAATGTTGACCATTTTGGAGGGAAGCGATATTCACGT 
ATTACTCTCCATGCCTTACTCAGTGCAGGATTTTTATACCAGTATAGACGTCTTGGCGGA 
GTTGGATAACGGTACTCAAGTAATTATTGAGATTCAAGTCCATCATCAGAATTTTTCATC 
AATCACTTGTGGACTTACCTGTGCAGTCAGGTTAATCAAATCTTGAAAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 383 526 F 48 aa 

> 3049862-1 ORF translation from 383-526, direction F 
VQDFYTS I DVLAELDNGTQVI I E I QVHHQNF SSI TCGLTC AVRL I KS * 

Description : 
unknown 

Assembly ID: 3112810 
Assembly Length: 8 85bp 

> 3112810 Strep Assembly ~- Assembly id#3112810 

CTCATCATCTGTCAAAAAGCGTTTCTTAGCAGTCGTGATATCCATAAAATAATCTAATAT 
CACGATTTCCTCATCCGCAAAGAAAGGAAGGCTGACCAACTCCAGTGCCACATCCTTGTA 
AACTACTTCTTGCATATCAAAGTAGGCAAAGTTGAGGTCAGCAGAATCATACCCAATCTG 
TTTCAACACTTGACTCTTCATCACTTCAAACTGACCCTGATCTGTCCCTGTAAATAGGCG 
CAGGCTCGGTAAATTCGATAAAGTCAACTTCTGACTTTCTTCAATGGCTAGCATCGTCTC 
TCCTTTCTTCAGATTTTTCGATTTAATTTAGTCAATATAGCGCAATTTCCCACGGAAATC 
TTCTAAGCTCTCGTAGCCTTTTTCCACCATGATTGCTTTCAGTTCATTGGTAAAGCGGTC 
AAAAGCACTGACGCCTTCTTTGTGAAGGGTCGTTCCCACCTGCACCATACTTGCTCCACA 
GAGGATGTGTTCAAAGGCATCTCGACCAGTCAGAACGCCACCTGTTCCGATAATTTGGAT 
TTGAGGATTTAAACGTTGATAAAAGGCGTGAACATTGGCTAGAGCAGTCGGTTTGATGTA 
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TTATCCACCAATTCCACCAAAACCATTCTTAGGCCGAATAACGACAGATTCGTCTTCXf.T 
ATAGAGGCCGTTTCCGATAGAGTTAACGCAGTTGACAAACTTGAGCGGATATTTGTTGAA 
AATAGCTGCCGCTTGATCAAAGTGAACAATATCAAAATAAGGTGGCAATTTAATTCCAAG 
AGGTTTGGTGAAGTAAGCAAACACOTCTGCCAAAATCCGGTCTGTTGTCTCAAAATCATA 
GGCAATCTGAGGTTTACCTGGAACATTTGGACAGGAAAGATTTAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 601 804 R 68 aa 

> 3112810-2 ORF translation from 601-804, direction R 
VFAYFTKPLGIKLPPYFDIVHFDQAAAIFNKYPLKFVNCVNSIGNGLYIEDESWIRPKN 
GFGGIGG* 

Description : 

LLCPYRDA NCBI gi : 511014 - Lactococcus lactis . DIHYDROOROTATE 
DEHYDROGENASE ( EC 1 . 3 . 3 . 1 ) ( DIHYDROOROTATE OXIDASE ) 

Assembly ID: 3112866 
Assembly Length: 92 5bp 

> 3112866 Strep Assembly -- Assembly id#3112866 

TCTTGGCCAACTGCATGGAGTTCAGCGGTCAATTTCAACGCACCTGAGAAACAGACCCCT 
GCACCCCTGAAATCTCAGGAGACATGATGGTCTGGATGGAATCAATAATGAGAAAGTCTG 
GCTGGATACGCTACCACTTCTGCACGAACACTCTGCATATTGGTCTCTGCATAGAGATAA 
AACTCACTATCAAAATCACCTAAGCGCTCTGCACGTAGTTTAATCTGCTGGGCAGACTCC 
TCCCCACTGACATAGAGAACTGTCCCCACTTGGGACAACTGGGTTGAGACTTGTAGGAGA 
AGAGTTGATTTCCCAATCCCAGGATCCCCACCGATGAGGACGAGACTTTCCTGGTACAAC 
TCCGCCTCCAAGCACACGGTTGAATTCCTCCATCTCCGTCTTGGTTCGATTGACATTGAT 
GGAAGTCACCTCAGCTAGTTTCATGGGCTTGGTTTTCTCACCTGTCAAGGACACACGCGC 
ATTCTTGACCTCGGCAACCTCAACCTCTTCCACAAAAGAAGACCAAGACCCACAGTTGGG 
GCAACGTCCCAGATATTTAGGGGAATTATACCCACAATTTTGACATACAAATGTCGCTTT 
TTTCTTTGCGATGACAAACCTCTTTCTATATCTCTAACTCACACTCAATCACTTGGCAAA 
AATCAATCTTCTCATTTGGCACAAACTGGCGCATGAGCATTCGATGAGCAACAACTACCA 
CAGTCTGATGTTCTCGATACTTAGACATACATTCTAGAAACCGAGACTTCATTTCCGTAG 
CTGTCTCATATTGAATAGGACTATTAGGAAGCAACTCCCCCTTGTTTTCTAAAAACAGTC 
TTCTAGCTGTTTCAAAGTTTTCTATTCCTGTTTTATAGACCTGCCATTCATGTAATAAAG 
G C TC T AC T C TT AAAGG AAG AC C C G T 

ORF Predictions: 

ORF # Start End Direction Length 
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1 220 513 R 98 aa _ 

> 3112866-2 ORF translation from 220-513, direction R 
VEEVEVAEVKNARVSLTGEKTKPMKLAEVTSINVNRTKTEMEEFNRVLGGGWPGKSRPH 
RWGSWDWEINSSPTSLNPWPSGDSSLCQWGGVCPAD* 

Description : 
SMS PROTEIN. - ESCHERICHIA COLI . 

Assembly ID: 3113664 
Assembly Length: 6 02bp 

> 3113664 Strep Assembly Assembly id#3113664 

TTATGTCAGTGGGATTACGCCTAATCTCCCAGAAGCAGAATTATTATCCGGTCAGGAAAT 
TAAAACCTTGGNAGACATGAAAACTGCAGCGCAGAAATTGCATGATTTAGGAGCGCCAGC 
AGTCATTATCAAAGGGAGGCAATCGTCTTAGTCAGGACAAGGCTGTGGATGTCTTTTATG 
ATGGACAGACCTTTACTATCCTAGAAAATCCAGTTATCCAAGGCCAAAATGCTGGTGCAG 
GTTGTACCTTTGCCTCTAGCATTGCCAGTCACTTGGTTAAAGGTGATAAACTTTTGCCAG 
CAGTAGAAAGCTCTAAGGCTTTCGTTTATCGTGCTATTGCACAAGCAGATCAGTATGGAG 
T AAG AC AAT ATG AAG C AAAC AAAAAC AAC T AAAAT CGCCCTTG TAT C C C T ATT AAC C GC C 
CTTTCTGTGGTTCTAGGTTATTTCTTAAAAATCCCAACACCTACAGGNATTCTAACTCTT 
TTAGATGCTGGTGTCTTCTTTGCGGCCTTTTACTTTGGTAGTCGTGAAGGAGCGGTAGTC 
GGAGGACTAGCAAGTTTCTTGCTTGACCTCTTATCAGGCTACCCTCAGTGGATGTTTTTT 
AG 

ORF Predictions: 

ORF # Start End Direction Length 



1 165 392 F 76 aa 

>. 3113664-1 ORF translation from 165-392, direction F 
VDVFYDGQTFTILENPVIQGQNAGAGCTFASSIASHLVKGDKLLPAVESSKAFVYRAIAQ 
ADQ YG VRQ YE ANKNN * 

Description : 
Thi protein - Rhizobium meliloti 

Assembly ID: 3113716 
Assembly Length: 456bp 

> 3113716 Strep Assembly — Assembly id#3113716 

CTGGATACTAAGAGAAATCAAAAAAGCACTCTAGGATAGAGGCCTAAAGTGCTTAGTTTC 
AAGGCTTTACAGCCTATCATATTTAATAAAATATTACAACATCTTGTTGTAGAATTCAAC 
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GACAAGTGCTTCGTTGATTTCTGGGTTGATTTCGTCGCGTTCTGGCAAGCGAGTCAAI^A 
ACCTTCCAATTTTTCAGCGTCGAATGATACGAATGCTGGACGTCCAAGAGTAGCTTCTAC 
TGCTTCAAGGATTGCTGGAACTTTCAATGATTTTTCACGAACTGAGATCACTTGACCTGC 
AGTTACGCGGTATGATGGGATATCAACGCGTTTCCCGTCAACAAGGATGTGACCGCTGGT 
TTACAAATTGGACCAAACTTGACGACCAGTAGTCGCGAGACCAAGACGGTAAACAACGTT 
ATCCAAACGACGTTCCAAAAGAAGCATAAAGTTGAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 94 291 R 66 aa 

> 3113716-1 ORF translation from 94-291, direction R 
VISVREKSLKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRLPERDEINPEINEALWEF 
YNKML* 

Description : 

3 OS RIBOSOMAL PROTEIN S4 (BS4) . - BACILLUS SUBTILIS . 

Assembly ID: 3174176 
Assembly Length: 19 61bp 

> 3174176 Strep Assembly Assembly id#3174176 

CTAATATAGAATAATCACCGCCGTTGTGAAAGAACGATTGGATGATAATCCAATCGTTCA 
GGGAAATTGGAAGACCTTGGGTTTCCAATTTAGGCATGAGACACCTTTGGTGGCTGCTGC 
CGTCCCTCACAAGCTAAGGTGATTGTTGAAAAAGAGGAAAAAGGAGAAGAAATGAAACCA 
GTAATTTCCATCATCATGGGCTCAAAATCCGACTGGGCAACCATGCAAAAAACAGCAGAA 
GTCCTAGACCGCTTCGGTGTAGCCTACGAAAAGAAAGTTGTTTCCGCACACCGTACACCA 
GACCTCATGTTCAAACATGCAGAAGAAGCCCGTAGTCGTGGCATCAAGATCATCATCGCA 
GGTGCTGGTGGCGCAGCGCATTTGCCAGGCATGGTAGCTGCCAAAACAACCCTTCCAGTC 
ATTGGTGTGCCAGTCAAGTCTCGTGCTCTTAGTGGAGTGGATTCACTCTATTCTATCGTT 
CAGATGCCGGGTGGGGTGCCTGTTGCGACCATGGCTATCGGTGAACTCTTTTTTAGGATA 
TAAAACAGGGTTCGGATAAGTTTTTTTGCAAGGTGGATGATGGCTACATTGTAATGTTTT 
CCTTGTTCTAACTTAGTCTTAAAAGCAGGTGAAAAGTGAGGGCATGCTTTGGCAGCTTGT 
ATGAGTACCTACCGCAGATAAGGGGAACCCCGTTTGACCATCCTCCCAGCTAAATCAATC 
TGACCTGACTGATAAATAGAAGAATCCAGTCCAGCGAAAGCTTGTAATTGAGCAGGATTA 
TCAAAGGCATGAATATTTCGAATCTCGGCTAAAATGACCGCCCCTAAACGATTCTCAATC 
CCAGTAACCGTCGTGATGACCGAGTTTAACTCAGCCATCAAGTCATTGACACATTTTTCC 
GCCTTGTCAATGAGCCTCTTGTAATGTTTGATGTTTTCATTACACGAGATAAAACGTCTA 
TGCGTTATCAAACTCATTACCAATTAAAACAAATGTGGTTAGATCCTTTCGGAAATTGTC 
AAGCGATTGGAGGAAATGAACTAATCCACAGCGGCTTATTCCAAGTATACCACTTGGGCT 
TTGGCAGTAGCTAACTGCGCTAAATATAATATAAGGAGGAGTAAAATGAAGACAGTTCAA 
TTTTTTTGGCATTATTTTAAGGTCTACAAGTTCTCATTTGTAGTTGTCATCCTGATGATT 
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GTTCTGGCGACTTTTGCCCAAGCCCTCTTTCCAGTCTTTTCTGGACAAGCGGTGACGil^G 
CTAGCCAATTTAGTTCAAGCTTATCAAAATGGGCAATCCAGAACTTGTATGGCAAAGCCT 
ATCAGGAATTCATGGTCAATCTTGGCCTGCTGGTTTTGGGTTCTATTTATCTCTAGGTGT 
AATATAAACATGTGTCTCATGACGCGCGTGATTGCAGAATCGACCAACGAGATGCGCAAA 
GGTCTCTTTGGTAAGCTTGCTCAGTTGACGGTTTCTTTCTTTGACCGTCGACAAGATGGC 
GATATCCTGTCTCATTTTACCAGTGATTTGGATAATATCCTCCAAGCCTTTAACGAAAGC 
TTGATTCAGGTCATGAGCAATATTGTTTTATACATTGGTCTGATTCTTGTCATGTTTTCG 
AGAAATGTGACGCTGGCTCTCATCACCATTGCCAGCACCCCATTGGCTTTCCTTATGCTG 
ATTTTCATCGTGAAAATGGCACGTAAATACACCAACCTCCAGCAGAAAGAGGTAGGGAAG 
CTCAACGCCTATATGGATGAGAGCATCTCAGGCCAAAAAGCCGTGATTGTGCTAGGAATT 
CAAGAGGATATGATGGCAGGATTTCTTGAACAAAATGAGCGCGTGCGCAAGGCAACCTTT 
AAAGGAAGAATGTTCTCAGGAATTCTTTTCCCTGTCATGAATGGGATGAGCCTGATTAAT 
ACAGCCATCGTCATCTTTGCTGGTTCGGCTGTACTTTTGAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 139 543 F 135 aa 

> 3174176-1 ORF translation from 139-543, direction F 
VIVEKEEKGEEMKPVIS I IMGSKSDWATMQKTAEVLDRFGVAYEKKVVSAHRTPDLMFKH 
AEEARSRGIKIIIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGVDSLYSIVQMPGGV 
PVATMAIGELFFRI * 

Description : 

PHOSPHORIBOSYLAMINOIMIDAZOLE CARBOXYLASE CATALYTIC SUBUNIT (EC 
4.1.1.21) (AIR C ARBOXYLASE) (AIRC). - BACILLUS SUBTILIS . 

Assembly ID: 3174186 
Assembly Length: 3 7 5bp 

> 3174186 Strep Assembly -- Assembly id#3174186 

CTATCTCCAAGTNCGNTTGGAATNCCTCCGCNANCCACAACTCATCCAAGCACTTTNCAA 
CGTGNCCTGGTCCGGTCCTCCAGTGCGTCTNACNGCACCTTCAACCTGCNCATGGGTAGG 
TCACATGGCTTCGGGTCTACGTCATGATACTAAGGCGCCCTATTCAGACTCGGNTNCCCT 
AGGGCTCCGTCTCTTCAACTTAACCACGCAACAGAACGTNACCCGCCGGTTCATTCTACA 
AAAGGCAGNCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCACACNGCTTCAGGTN 
CTATTTCACCCCCCTCCCGGGGAGCANCTCAACTGACCCNCACGGCACCGGTGNANNAAA 
CGGTCACTTAGGGAG 

ORF Predictions: 

ORF # Start End Direction Length 
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1 83 283 F 67 aa _ 

> 3174186-1 ORF translation from 83-283, direction F 
VRXXAPSTCXWVGHMASGLRHDTKAPYSDSXXLGLRLFNLTTQQNXTRRFILQKAXSHPL 
TGSNLL* 

Description : 
unknown 

Assembly ID: 3174374 
Assembly Length: 6 65bp 

> 3174374 Strep Assembly -- Assembly id#3174374 
GGGGGGGGTKHNFNTT 

ATCATTTGAATTGGCCTGTGGATTTTAGCTAGCAATCCAGAGCGAGTTTTCTCCAAGACA 
G AC C TC TATG AAAAG ATC TGG AAAGAANAC T AC GTGG ATG AC AC C AAT AC C TTG AATGTG 
C AT AT C C ATG C T C T TC G AC AGG AGC TG G C AAAAT AT AG T AGTG AC C AAAC G C C C AC TAT T 
AAGACAGTTTGGGGGTTGGGATATAAGATAGAGAAACCGAGAGGACAAACATGAAACTAA 
AAAGTTATATTTTGGTTGGATATATTATTTCAACCCTCTTAACCATTTTGGTTGTTTTTT 
GGGCTGTTCAAAAAATGCTGATTGCGAAAGGCGAGATTTACTTTTTGCTTGGGATGACCA 
TCGTTGCCAGCCTTGTCGGTGCTGGGATTAGTCTCTTTCTCCTATTGCCAGTCTTTACGT 
CGTTGGGCAAACTCAAGGAGCATGCCAAGCGGGTAGCGGCCAAGGATTTCCCTCCAATTT 
GGANGTTCAAGGTCCCTGTTAAATTTCCCCCATTTAGGGGCAACCTTTTAATGAAANTTT 
C CNTN ATTTGC C GGGT ANC TTTGAATC C C TNGG AAAAAAC C C AACNAAAAAAAGGGC TT A 
NNCCC 

ORF Predictions : 

ORF # Start End Direction Length 



1 154 294 F 47 aa 

> 3174374-1 ORF translation from 154-294, direction F 
VDDTNTLNVHIHALRQELAKYSSDQTPTIKTVWGLGYKIEKPRGQT* 

Description : 

REGULATORY PROTEIN VANR . - ENTEROCOCCUS FAECIUM (STREPTOCOCCUS 
FAECIUM) . 

Assembly ID: 3174972 
Assembly Length: 9 89bp 

> 3174972 Strep Assembly Assembly id#3174972 

CTACGATATCTTTGGTCTTTTGTAAGATATGAGGTCCACCCTTATGCGCCTCAGTTGGCA 
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TTTCATGCGATTCAAGAAGTTGCCCCTCTTGATCAACCAAACCATACTTGATGTTGGUC 
CACCGATATCAATTGCAACGTAATATGTCATAAATACCTCCTTTTAGATTAGAGGAAGCG 
CTCCTTGGTTTCACGAATCAAGGCAGCAGCCGCTTCTACAACTGGACGATCTTCTTCAGT 
CACTGGTGTCAATGGTGAACGAACAGATCCAATATTCAAGCCTTCATTGATTTTCAAGAC 
TTCTTTGATGACACCGTACATATTTCCATGAGCAGAAGTGAGTTTACCAATGATTGCGTT 
GATAGCATACTGCAATTCACGCGCTGTTTCTAGGTCCTTATCCGCAATCAACTGATTGAG 
TTTCAAGAAGAGTTCTGGCATAGCACCATAAGTACCACCGATACCAGCCCTAGCCCCCAT 
GAGGCGTCCTCCTAGGAACTGCTCATCAGGACCATTAAAGACGATATGGTCTTCTCCACC 
AAGGCTGACAAAGGTTTGGATATCTTGAACTGGCATAGAAGAGTTCTTCACACCGATAAC 
ACGAGGATTTTTCAACATTTCTGTGTAAAGGCTTGGAGTCAAAGCAACCCCTGCCAATTG 
AGGAATGTTGTAAATCACGTAGTCTGTGTTTGGAGCTGCAGAACTGATATCGTTCCAGTA 
TTTGGCAACTGAGTTATTCTGGCAAGCGGAAATAAATTGGTGGAATCCGTTGCAATAGCA 
TCTACTCCCAAGCTTTCAGCATGGCGAGCAAGTTCCATACTATCTTTAGTATTATTGCAA 
GCAACATGGGCAATAATGGTCAATTTACCTTTGGCTACCGCCATGACTTCTTCCAAAATC 
AACTTGCGATCTTCAACGCTTTGGTAGATACATTCACCAGAAGAACCATTGACATAAGAC 
CTTGAACACCTTTATCAATGAAGTATTGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 169 678 R 170 aa 

> 3174972-1 ORF translation from 169-678, direction R 
VIYNIPQLAGVALTPSLYTEMLKNPRVIGVKNSSMPVQDIQTFVSLGGEDHIVFNGPDEQ 
FLGGRLMGARAGIGGTYGAMPELFLKLNQLIADKDLETARELQYAINAIIGKLTSAHGNM 
YGVI KEVLKINEGLNIGSVRS PLTPVTEEDRP WEAAAAL IRETKERFL * 

Description : 

N - AC E T Y LNEUR AM I NAT E LYASE SUBUNIT (EC 4.1.3.3) (N- 
ACETYLNEURAMINIC ACID ALDOLAS E) (N-ACETYLNEURAMINATE PYRUVATE 
LYASE) (NALASE) . - ESCHERICHIA COLI . 

Assembly ID: 3175138 
Assembly Length: 145 Obp 

> 3175138 Strep Assembly -- Assembly id#3175138 

CTCCATATTTCTTAGCCTTCTCAATTAGGGTCTTGAAGTCTTCGACACCACCGATACGCT 
TACCAATATCAGCATAGTTCAAGTGACCAGAGTCATGGCTGTGATATCCTTAACTTTTTC 
CCAACCTTGAGGGTTGTTCATAATGCTACGATAAGCAATGGCACCATCTTGCCAATCAAC 
TTTCTTGTCTGCATTGGCATCTTCAGTGATAACAACCTTAGCACTTGGAAGTTCCTTCGT 
GTATTCTGGGAAAACAATGCCCTTATAAGCTTTTTCCCATTGCCATTCAGAGCTGTGGAT 
TCCTACATAGTTGGCATTTCCGACTGTTTCTTTATAAGCTGTCAAACGAGTCCAGTCATT 
CGAACCACCACCATAGCTATTTTGAGAGTTACTCCAAACACCAGCAGCAAGCTTATCTGT 
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AGAAACAAATCCATACATGTAACCCTTAGCCAAATCCTTCATTGGATTGGTTACATCGAT 
ATGATCATCTCCGCTGACATGCGTATTGTTTGACATGGTTGCCCCATCAAACTTAGCACC 
AGTTTGATCACTAGAAACAGAGACTAAAGCATTGCCGAGGAAACTAATAGAAGAAAGTAG 
TTTTCTTTCGTCATCAATCTTTTGACCTGGAGTGACTTGATTGTGGTTGACAATCTTGGT 
CACATCAAAGTGCAATTGATTGTCCACAACTTGCAAGCGTACTGTCATTTCCGCATTGAT 
TAAGTGAGCATCATCGCGAAGCTTCATCAAGTACTCTGCTGTTGTCTCATTGATTTTTTT 
ATAAGTGACTTCAGGGGTGATTCGGTGGTTATTGATAAAGACTTGGTTGAATTGTTGCAC 
CTGTCCTGGCAAAGTATGTCCATTCAAGGTGTATCCCTTGACACGAAGGAAGGCTTGGTC 
AATTACTGCCTTAAGTACCTTAAACTGGATCGTATCATAAGTCACCTTGCTATCGTCAAC 
AACCGGACCTGTTTCTTTCTGGGCAGGGGTATCCTCTGGGTTTTACCCTCTCTGTGGCTA 
TCCGTTTCAACGCTTGAACAACTGGTCGCTCATCGTCATAAGAGCCCGCCTTGAGAAAAA 
TCTTCTTCTCATTTCTAAGATGGTCATTGACCGCAGCTGGTAGAGTCACTGTGTCAAAGA 
AGATTGACATCCTTATTTGCCTGGCATTTACCTGACCGTCTGACTTGAAGACTGATAGAG 
AGACGGTTTGTTGATCCTGTTTCAGGAGCAGCAACACGACTACCTCTATACCAAGTGCTA 
GTTGTTGGAGATTTATACTCCCAGAACCAGCCATCCTTGTCATAACCGACAAAAACATTA 
TTATTGGTATCTTTAAATTTCAAGGAGACACCAAAGCGTGATTTGCCCTTTTCAGAATCT 
TCTTTGAAGGTTAAATCAACAGTTGCATTTCCATTGGCATCAACGGTCAAGCCCTTCTTT 
TCAAACAGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 79 945 R 289 aa 

> 3175138-1 ORF translation from 79-945, direction R 

VTYDT I QF KVLK AVI DQ AF LRVKG YTLNGHTL PGQ VQQFNQVF INNHR I T PE VTYKK INE 
TTAEYLMKLRDDAHLINAEMTVRLQWDNQLHFDVTKIWHNQVTPGQKIDDERKLLSSI 
SFLGNALVSVSSDQTGAKFDGATMSISnSfTHVSGDDHIDVTNPMKDLAKGYMYGFVSTDKLA 
AG VWSN S QN S YGGG SNDWTRLTAYKE TVGNANYVG IHSS EWQWEKAYKG I VF PE YTKEL P 
SAKWITEDANADKKVDWQDGAIAYRSI]vnsnS[PQGWEKVKDITAMTLVT* 

Description : 
unknown 

Assembly ID: 3175860 
Assembly Length: 42 0bp 

> 3175860 Strep Assembly -- Assembly id#3175860 

CTGCGAGTTGTGAGGCTCCTATTATGTCTCGTGATTAAAATCTCTATAAGGTGATTTTGG 
AGGGAAATTATCGGGCGACAGCGGGTAGAGAAGAGATGAAAGAGGCTATTTTGGAATATC 
AAGCAAATCCTGCTGCCTTAAAAGATCTCAAAGAAAAGGCTAAGAATATTTCCAGAGAGT 
ATTC TG AAG AG C AT C TGT T AC AAATC TGGTTGG AC TT TT ATG AG AAAC AAGC C GC TTT AG 
GGACAAAGTAAAAAGTGAGGTAATCTATGCGAATTGGTTTATTTACAGATACCTATTTTC 
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CTCAGGTTTCTGGTGTTGCGACCAATATCCCAACCTTGAAAACCCACCTTGAAAACACLGG 
ACTTGCCTGCATTTNTATCTCATACAATCCACCGAATTTCGATGTCCCCCTCCCTACAAC 

ORF Predictions: 

ORF # Start End Direction Length 



1 51 251 F 67 aa 

> 3175860-1 ORF translation from 51-251, direction F 
VILEGNYRATAGREEMKEAILEYQANPAALKDLKEKAKNISREYSEEHLLQIWLDFYEKQ 
AALGTK* 

Description : 
unknown 

Assembly ID: 3175918 
Assembly Length: 661bp 

> 3175918 Strep Assembly Assembly id#3175918 

CTCCCCAAACTTTTATTTGAGAGTGAACGGTATAAGAATATGAAACCGGAGGTTAAGGTG 
GTTTACTCAGTTTTAAAAGATCGGTTGGAGTTGTCTTTGAGCAAAGGTTGGATTGATGAG 
GATGGGACTATTTATTTGATTTATTCCAATTCAAATTTGATGGCACTTTTAGGCTGTTCA 
AAGTCAAAATTACTCTCCATGTGAGTTTGAAGTGACATTTTTAGATGATTACCATAAAAA 
ACATAACTACCCACTATTTTACGAATCCTATCTTCAAAACGTTATGGAATTCCTTGAAAG 
TCAAGACATAAAGAATGGGGTTGATGCCTTTGTAGATGATCATCAAAATCTCGTTTTTGT 
TTTATATGGACAAGGCTATCGAGCCGAGGGAAAAGAGGGAATACTTACAACCCAAGTAAC 
TGTAAAAGCTTATGATGAAGACAAGAAACCGATTAACTTCGCAAATTTATTAGATTCCTT 
AATCGTGTCAGAATATCAAATGGAACCGAATCTTTGGGAGGTCTCCTATGATTGATCTCT 
ATCTAAGTAAAAATAGCCGAAGAAATCAACTTCTTTTAGACTTCTTCCAAAACTATGGCA 
TCGAGGTATCTTGTCATTCAGTTTCTGAAATGACAAAGGACAAATTAATTGAGATGATGA 
G 

ORF Predictions : 

ORF # Start End Direction Length 



1 212 535 F 108 aa 

> 3175918-1 ORF translation from 212-535, direction F 
VTFLDDYHKKHNYPLFYESYLQNVMEFLESQDIKNGVDAFVDDHQNLVFVLYGQGYRAEG 
KEG I LTTQV TVKAYDEDKKP INF ANLLDSL I VSE YQME PNLWEVS YD * 

Description : 
unknown 
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Assembly ID: 3811220 
Assembly Length: 142 9bp 

> 3811220 Strep Assembly -- Assembly id#3811220 

CTGCCCCTGTAAGGCTGGACGATTGCCTTTCTTAGTATCCGCAAAGAGGTAAACTGAGAA 
TAGAGAGGATTTCTCCTTCAATATCTTTGACAGACAGGTTCATCTTGCCTTCTACGTCTG 
AAAAAATCCGCATATTGACCAGTTTTCTCACAGCATAGTCCAAATCTTCCTCTTGGTCCT 
CTGGTCCAACACCAACCAGCAATAAAAGTCCCTGATTGATTTTTCCCTGAATCTGGCCTT 
CTATACTCACTTGGGCTTTTTTAACCCGTTGGATAATGATTTTCATAATAGCCTTTCTAG 
TAAGAGCTAGGACAACTAGCCGTTGGTCCGTTTGACAGAGTAAACTTCTGGCACACTCTT 
AATTTTATCGACAACCGTGGTCAGTGTAGAGAGGTTGGCAATACCGAAGGACACATGGAT 
ATTAGCAAACTTCATATCCTTGGTTGGTTGGGCATTGACCGTTGAAATATTCTTGGTTGT 
ATTTGAAAGAACTTGCAGTACATCGTTCAACAGTCCTGTACGGTTGAGACCGTAGATATC 
GATATGGGCCATATACTCCTTATTTGAGCTAGAGTACTGGTCTTCCCATTCCACATCAAG 
GAGACGTTGCTCGTAGTTTTCTTGGGCACGCAGGTTCATACAGTCCACACGGTGAATAGC 
CACACCACGACCCTTGGTAATGTAGCCAACAATATCGTCACCAGGCACGGGGTTACAACA 
CTTAGCAATCCGCACTAGGAGACCAGAAGCACCTTCAATAACCACTCCCCCCTCATGCTT 
GACCTTGGAGAGTTTCTTTATTTTCAACCTTGACCTCGCCACCTTTGACAAGCTCCTCTG 
CCTCAGCCTTGGCCTTGGCACGCTCTTCCTCACGGCGTTCTTTTTCAGTCAGACGGTTAA 
AGACGGTAATCGCACCGATTTCCCCAAAACCAATGGCCGCAAAGAGGGAGTCTTCTGTCT 
TGTAACTGGTCTTTTGCAGAACTTGATCCATGTGGCGCTTGTCCATAAATTTATTTGCCA 
CATAGCCATTTTCTTGGAACTGAGCCATCAGCATCTCACGACCCTTGTTGACAGACAATT 
CCTTATCTTGGTTTTTAAAGAACTGGCGAATCTTATTGCGCGCCTTGCTAGTCTTGACCA 
TATTGAGCCAGTCACGGCTAGGTCCAAAGGAGTTCGGGTTGGCGATAATTTCAACCTGAT 
CCCCTGTCTTTAACTTGGTTGTCAGTGGAACCATGCGGCCATTGACCTTGGCACCAGTTG 
CTTTTTCACCGACCTTGGTATGGATTTCGTAGGCAAAATCAATCGGTCCTGAATCTTTGG 
GAAGAGAACGGACAGCTCCATCTGGGGTAAAAACGTAAATCTCCTCAGCCAGATAGTTTT 
CCTTAACAGAGTCCACAAATTCCTTAGCATCATCAGCCTGGTCTTGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 316 873 R 186 aa 

> 3811220-2 ORF translation from 316-873, direction R 
VRKSVPRPRLRQRSLSKVARSRLKIKKLSKVKHEGGWIEGASGLLVRIAKCCNPVPGDD 
IVGYITKGRGVAIHRVDCMNLRAQENYEQRLLDVEWEDQYSSSNKEYMAHIDIYGLNRTG 
LLNDVLQVLSNTTKNISTVNAQPTKDMKFANIHVSFGIANLSTLTTWDKIKSVPEVYSV 
KRTNG * 

Description : 

stringent response-like protein - Streptococcus equisimilis 
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Assembly ID: 3811436 
Assembly Length: 1513bp 

> 3811436 Strep Assembly -- Assembly id#3811436 

CTCTGCAATGATGTACTCAAACATCTCCGCTTCTAGTTCCTCCTTAGGCAGAGGCAATTT 
CCCACGTCGCATCCGGTTCATAAAGACCGTATGGTTTTCTAAAATCAAACTATACAAACT 
CATGTGGGGAATATCCAATCCAATGGCTTTAGCCACATTTTCCTTTACTTGCTCCATGGT 
CTGACCAGGCAGAGCATAAATCAAATCAATGGAGATGTTGTCAAAACCAGCCAGTTTCAG 
GCGATCGATATTTTCATAAATATCCTTCTCCAAATGACTGCGCCCAATCTTTTTCAACAT 
C TT ATC ATC AAAGGTC TGG AC AC C T AGC G AAAC AC G ATTG AC AG C C G AATTTTTC AAAAC 
AGCTATCTTATCCGCATCCAAATCGCCTGGATTGGCTTCAATGGTCAACTCTTCCAAGAC 
AGACAAATCCAAGTTTTTAGTCAAGCCATTCAGTAACACCTCCAGTTGCGGAGCCGACAG 
GGCTGTCGGTGTTCCACCACCGATATAAAGGGTTGACAACTTTTCAATATCATAAGAACG 
AAACTCTTCCAGCAGATGCTCTAAATAGCTGTCGACTGGCTGATTTTTGATGAAGACCTT 
TGAAAAATCACAATAATAACAAATCTGGGTACAAAATGGGATGTGCACATAGGCTGACGT 
TGGTTTTTTCTGCATAGTAATTATTATACCACAAAGACTAGATTCCAGATAAAAATCACC 
ATCCCCAGATACATAGTCCGTCCGGAGATGGTGATGGTTTATTCTTCTGTTATATCAATC 
ACAATCTCTTCTGAGTCATCAAGAGCTTCGGCTTTTTCTTGCCATTGTTCCTTGAGATTA 
TTTAATTGATTTTTTGATGCTTCTGTCGCTTGAAAAGCATAGGATTTAGCTTGAGCAAGT 
ATACTGTCCACAGTGATTTCACCTGACTCAACCTGTTCTTTTGTTTTCAGAACAAAATCT 
GTAGCCTGCTCCTTAACTTCTGTCAGTTTTTCACAGACTTGCTCCTTGGCATACTCCGGA 
TCTTCTCTCAAATCATCTAAAAAATCTTGAGCCTGACTGCAAACTTGTTTGCCCTTATCA 
CTTGTTAAAAACAAGGCAAGAGCTGCACCTGAAACGGTTCCTAAAAGGATTGAGGATAAT 
TTACCCATAAGGATTCTCCTTTTTTATTTTTTGAAAAATTTACTTGCAAGACGAAGAGCT 
GACAGACTTGCACCAGTCTTGAGTGTTTTTGAACCAGCTGATGAAGCTTTCTTGCTCAAG 
ACACGCGCATGGTCATTGAGGTCTGAAACAGATAGAGATAAATCTGCAACAGCACTGAAG 
AGTGGATCAATCGTAGCCACCTTGACATTGATATCATCTGCCAAGACATTGACCTTAGCC 
AACAACTCATTGGTGTGATGCAAGGTCACATCCACATCTGAAGTCAAGGTTTTAATCGTC 
TTTTCTGTTTCATCGATGACACGACCAAGCTTTTGTACAGTAATGATCAGATAGACCAAA 
AAGACAATCACAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 1164 1511 R 116 aa 

> 3811436-3 ORF translation from 1164-1511, direction R 

VIVFLVYLIITVQKLGRVIDETEKTIKTLTSDVDVTLHHTNELLAKVNVLADDINVKVAT 

IDPLFSAVADLSLSVSDLNDHARVLSKKASSAGSKTLKTGASLSALRLASKFFKK* 

Description : 
unknown 
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Assembly ID: 3811984 
Assembly Length: 5 0 5bp 

> 3811984 Strep Assembly -- Assembly id#3811984 

CTCTTGTCAGAGAAATTTACAAAACGTTAGGAGAATAAGATGGCATTTATTGAAAAAGGT 
CAAGAAATCGATATGGAAGTCATCAAGGCTGAAACCCAATTGTCTGCAGAAGCCTTGAGA 
CTCAAGGAAAGCCGTGACAGGGAATTGGCAGATATTATTTCAGGGGAAGATGACCGTATT 
CTCTTGGCTGATTGGTCCTTGCTCTTCTGATAATGAAGAGGCGGTCTTGGAATATGCTCG 
CCGTTTATCCGCCTTGCAAAAGAAGGTAGCGGATAAGATTTTCATGGTCATGCGCGTGTA 
TACTGCTAAGCCTCGTACCAATGGAGACGGCTATAAAGGGTTGGTTCACCAGCCAGATAC 
TTCTAAGGCTCCAACCCTGATTAACGGCTTGCAGGCTGTGCGCCAGTTGCACTACCGCGT 
TGATTACAGAGACTGGTTTGACAACGGCAGATGAGATGCTTTATCCGTCAAATCTGATCT 
TGGTGGATGACTTTGGTCACCTACC 

ORF Predictions: 

ORF # Start End Direction Length 



1 134 454 F 107 aa 

> 3811984-2 ORF translation from 134-454, direction F 
VTGNWQ I L F QGKMTVF S WL IGPCSS DNEE AVLE Y ARRL S AL QKKVADK I FMVMRVYT AK P 
RTNGDGYKGLVHQPDTSKAPTLINGLQAVRQLHYRVDYRDWFDNGR* 

Description : 

PHOSPHO-2 -DEHYDRO-3 -DEOXYHEPTONATE ALDOLASE, TYR- SENSITIVE ( 
4.1.2.15) (PHOSP HO-2-KETO-3 -DEOXYHEPTONATE ALDOLASE) (DAHP 
SYNTHETASE) ( 3 -DEOXY-D-ARABINO-HEP TULOSONATE 7 -PHOSPHATE 
SYNTHASE). - ESCHERICHIA COLI . 

Assembly ID: 3857228 
Assembly Length: 182 7bp 

> 3857228 Strep Assembly -- Assembly id#3857228 

CTCTTTTAACCGTTTTAGCGGTGACACCGAGGATTTTTTCAGGACCCAAGACTTGTCGGG 
CAACCGAAACTGGGAGTTCGTCATCTCCAATATGCAGACCAGCAGCATCAACCGCAAGAC 
AAACATCCAACCGATCATCGATTATCAAGGGGACCTGATAGGCATCTGTTATTTCCTTGA 
CTTGTTTTGCCAGTTGATAATATTGATTGGTTGTGAGATTTTTTTCTCGCAATTGGACTA 
TGGTAACCCCTGAACGGCAGGCCGTCTCAACTTTTGCAAGAAAGCTTTCCACGGAATCTT 
GATAGCGATTGGTTACCAGATATAGTCTAAGCGCTTCTCTATTCATAAACCTCTCCTTTG 
ATGGTATCTAGCCAATTTTCATCTCTTCTTAGGAGCGAAAGCTGATTGAGTACTTGGTAA 
CGAAATTCTTCCAATCCCATTCCTTGAACAACTATTTTCTCAGCAGCGATATTGAGATAA 
GAGACTGCTAAGCAAGAACTTCAAAACCAGTCTTTCCTTGGCTGAGAAAAACAGCTGTTA 
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AGGCTCCAACCAAGTCTCCTGTCCCTGTTATCCAGTCTAATTCAGTACAGCCATTCTCAA 
GTACAGCAACTTGATTCTCCGAAACAATAAGGTCCTTGGGACCTGTGACTAAGAATGACA 
TACCACGATAGGTCTGACACCAGTCTTTCAAGACTTGAAGCAAATCCTCCGTTTCTTGAT 
CTTTAGCACTCGCATCGACCCCAACGCCGTGATGCTTTAATCCAACAAGACTT-CGAATTT 
CTGACATGTTTCCTTTAAGGACCGTAGGTCTATAGTCTAAAAGGTCTTTAACTAAGCTCT 
TACGAATGGATGAAGTCGTTACGCCAACCGCATCTACTACCATCGGGAGAGAAGATTGGT 
TTGCATACAAAGCTGCCATGCGGATTGCTTTTTCCTTCTCAGCTGACAAATGCCCCAAAT 
TGATGAAGAGAGCCTGGCTTTGCTTAGTAAAATCAAGAACTTCACGGGGATCATCTGCCA 
TGACAGGTTTGCATCCCAGAGCCAAAATCCCATTTGCCAGCATCTCACAAGAAATCTCAT 
TGGTCATACAGTGAATGAGGGAACTAGAGCCTATAGGAAAAGGATTTGTCAATGCCTGCA 
TCATTCTATCCTTTCAGCAAAGAAATATCCTTGCACTTTTTTAAAGAATTCCTGCTTGAT 
TAAAAATCTAAATGCAATAAAGGAAATCGCTGTACCAATCAAGGTTGCTCCGAAAAATCG 
AGGCGTGTAGATAAACCAACTAAGCTTAGCAGCCGATCCTGTAAAGAGCACCATAACAGG 
ATAGGAAACAATAGAACCAATAATACCTGTTCCCACAATTTCTCCCAAGGCAGAAAAGTA 
AAATTTTCGACCGTACTTATAAAAGAGACCTGCTAGAAGGGCTCCAAAAGTCGCTCCTGT 
GAGAGATAAAGGAGC TTATC GG AAT AC CCTTGAGTCGTC AT ACGG AT AAAGGCTGTC ACT 
GTAGCCATAGCCAAGGCATAAACAGGTCCCATCATGATTCCCGCTAGAATATTGACTACA 
CTGGACATCGGTGCCATTCCCTCAATCCGAAAGATAGGTGTAAGGACTACATCAAGGGCA 
ATCATCATAGATAAAATGGTCAATTTGTGAACTTGTAGTTGGTGCTTTCTCAAGTTTCTA 
TTCTTCTCCTTTTTCTAAAGACTGTAAATCGCTCTTCCATGTCTGGTGTTGGTAAGCCAT 
CTCCCAAAACTTGGCTTCCATATGAACACTGATGTGGAAGGCATCTAGCATTTTTTGCTT 
ATCTGTCTCATCACTTTCTCGATAGAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 1141 1356 R 72 aa 

> 3857228-2 ORF translation from 1141-1356, direction R 
VGTGIIGSIVSYPVMVLFTGSAAKLSWFIYTPRFFGATLIGTAISFIAFRFLIKQEFFKK 
VQGYFFAERIE* 

Description : 
unknown 

Assembly ID: 3857842 
Assembly Length: 4 8 5bp 

> 3857842 Strep Assembly -- Assembly id#3857842 

CTATTGCCAATCCATATAGCCTATCAGGTGGTCAATAACAACGTGTGGCCATCGCTCGTG 
GCCTATCAATGAATCCAGACATCATGCTCTTCGATGAACCAAATTCTGCCCTTGACCCTG 
AGATGGTTGGAGAAGTAATTAACGTTATGAAGGAATTGGCTGAGCAAGGCATGACCATGA 
TTATCGTAACCCATGAGATGGGATTTGCCCGCCAGGTTGCCAACCGCGTTATCTTTACTG 
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CAGATGGCGAGTTCCTTGAAGACGGAACACCTGACCAAATCTTTGATAACCCACAAC^CC 
CTCGTCTGAAAGAGTTCTTAGATAAGGTCTTAAACGTCTAAACTCAAACTGCAAGGATTT 
CCTTGCAGTTTTTCTACCTCGTATTGGAATTTTTGATTTTTCGGAAAATTATGTTAGAAT 
T AAGTTTATG AAATGAGGTTTC C TC ATACCTAGC AAG AC TAGGAATAAAAATAGAAATTA 
GGTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 45 341 F 99 aa 

> 3857842-1 ORF translation from 45-341, direction F 
VAIARGLSMNPDIMLFDEPNSALDPEMVGEVINVMKELAEQGMTMIIVTHEM 
RVIFTADGEFLEDGTPDQIFDNPQHPRLKEFLDKVLNV* 

Description : 

GLUTAMINE TRANSPORT ATP -BINDING PROTEIN GLNQ . - BACILLUS 
STEAROTHERMOPHILUS . 

Assembly ID: 3857996 
Assembly Length: 1547bp 

> 3857996 Strep Assembly -- Assembly id#3857996 

NTCTTGGGCNCNGGGCGNNTCCTTTGAGGACNACGGTATCGATGACCTTGATCTCAAGTG 
CAAGCAGTATCTGAATCTGCAGCAGCACCTGTCCGTGCAAAAGTTCGTCCAACATACAGT 
ACAAACGCTTCAAGTTATCCAATTGGAGAATGTACATGGGGAGTAAAAACATTGGCACCT 
TGGGCTGGAGACTACTGGGGTAATGGAGCACAGTGGGCTACAAGTGCAGCAGCAGCAGGT 
TTCCGTACAGGTTCAACACCTCAAGTTGGAGCAATTGCATGTTGGAATGATGGTGGATAT 
GGTCACGTAGCGGTTGTTACAGCTGTTGAATCAACAACACGTATCCAAGTATCAGAATCA 
AATTATGCAGGTAATCGTACAATTGGAAATCACCGTGGATGGTTCAATCCAACAACAACT 
TCTGAAGGTTTTGTTACATATATTTATGCAGATTAATTTACAGAGGGACTCGAATAGAGC 
CCTCTTTTCAGGTTTTACCGTGACAATCCCTATTAAAAATTATATCAAAATCGTGAAAAT 
ATTGGAAAAGTATGGTAGAATGAAAATTGTCGTGTGAACGATAATACTCATTCTTGATGA 
ATTGTGAAGCAGTTGCCCTTGGGTCGTTTTGCGAGTTGAAGTCAAGAAGAGGAAAAAAAC 
AAAAAGGAGAAATACTCATCGAATTTCAATGAAACAACTTCTTGAGGCTGGTGTACACTT 
TGGTCACCAAACTCGTCGCTGGAATCCTAAGATGGCTAAGTACATCTTTACTGAACGTAA 
CGGAATCCACGTTATCGACTTGCAACAAACTGTAAAATACGCTGACCAAGCATACGACTT 
CATGCGTGATGCAGCAGCTAACGATGCAGTTGTATTGTTCGTTGGTACTAAGAAACAAGC 
AGCTGATGCAGTTGCTGAAGAAGCAGTACGTTCAGGTCAATACTTCATCAACCACCGTTG 
GTTGGGTGGAACTCTTACAAACTGGGGAACAATCCAAAAACGTATCGCTCGTTTGAAAGA 
AATTAAACGTATGGAAGAAGATGGAACTTTCGAAGTTCTTCCTAAGAAAGAAGTTGCACT 
TCTTAACAAACAACGTGCGCGTCTTGAAAAATTCTTGGGCGGTATCGAAGATATGCCTCG 
TATCCCAGATGTGATGTACGTAGTTGACCCACATAAAGAGCAAATCGCTGTTAAAGAAGC 
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TAAAAAATTGGGAATCCCAGTTGTAGCGATGGTTGACACCAATACTGATCCAGATGAIi^T 
CGATGTAATCATCCCAGCTAACGATGACGCTATCCGTGCTGTTAAATTGATCACAGCTAA 
ATTGGCTGACGCTATTATCGAAGGACGTCAAGGTGAGGATGCAGTAGCAGTTGAAGCAGA 
ATTTGCAGCTCCAGAAACTCAAGCAGATTCAATTGAAGAAATCGTTGAAGTTGTAGAAGG 
TGACAACGCTTAATTTATACAAATAGTAATTACCTAGGAGGGCGGGGCTTAGCCCGGCTC 
TCCTATTTTCAAAAAATATAGGAGAATTAAAATGGCAGAAATTACAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 58 456 F 133 aa 

> 3857996-1 ORF translation from 58-456, direction F 
VQAVSESAAAPVRAKVRPTYSTNASSYPIGECTWGVKTLAPWAGDYWGNGAQWATSAAAA 
GFRTGSTPQVGAIACWNDGGYGHVAWTAVESTTRIQVSESNYAGNRTIGNHRGWFNPTT 
TSEGFVTYIYAD* 

Description : 
unknown 

Assembly ID: 3858236 
Assembly Length: 740bp 

> 3858236 Strep Assembly Assembly id#3858236 

CTATAAAAAAAAGGGTAACCAGTATGGAGGATGAATGTCTGGAACTATCTGAGAATCTCG 
GATTTTGGAAATCAGACCGATCATCATGAGATAAGGAAGGAAAGCACTTGTAAAAAGCAC 
TGTAACCACGCCAGTCCCCTGTCCCAAGAGGGTGAGGTGGTAGCGTAAAACCATGCGGAA 
AAATCCCTTTTTAGTGGTTGAAATTCTCTCCTTGCTGCGACGTTCTTTTTTGACCTTCTC 
CTCACTATTAAGCAGGATCACGTCATAAAAACGAGGAAGGACCTTCTTTTTGGTCAGATA 
AAGCAGGAAGAGAGTTAGTCCTATCCAAGCGAGCAGACCCAATATGGCTTCTATTGAAAA 
AGGCTCCACTGCTATTTTGTAAAAGATATGAAGAGGATAAAGGAGAAATGGAATGTCTCT 
AACTTTGTCAACAATACTTCCAAAAGTCGACTGAAGAAAGAAGATAAATATTAAAGGTAT 
GAGAACTCCTATCCCAATCATCACATTCGAAAAAATAGACTGATACTTTCTGAAGACCCT 
AGTCTGAGCCAAGAAATGTACTGCCACTACCGTCACTAAAGTAACAGAGACAAATAATAA 
GGTCAAGGACAGTAGCATCAAAGGCAAACCCAGCCAAAGAGAAGGAGCTAGACTAATATA 
GAGGGCTAGAAAATAAGCTAGGATTGGTACAATTCCAGTTAGAGCTGGCAAGAGGACAGA 
CAGTCCTTTAGCAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 1 261 R 87 aa 
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> 3858236-1 ORF translation from 1-261, direction R _ 
VILLNSEEKVKKERRSKERISTTKKGFFRMVLRYHLTLLGQGTGWTVLFTSAFLPYLMM 
IGLISKIRDSQIVPDIHPPYWLPFFL* 

Description : 
unknown 

Assembly ID: 3858264 
Assembly Length: 2219bp 

> 3858264 Strep Assembly -- Assembly id#3858264 

ATCGAATTCGTTTTGCAAGTGGCGAAATGCGAACCACGTTTGTGTCTTTATAAGTTTCCA 
CGTCTTCTTTGTGGACACGACCGTTTGCACCTGAGCCAGAAACGTCGTAGAGGTTTATCC 
CTAAATCATCCGCTAACTTTCTAGCTGCAGGAGTCGCTGTTAGCTTGTCATCAGCCATGA 
CCTCTCCAATTCTATTTATGATACAAAGGGCGTCAAAAGCGACTGAAAAATAGGAAATCG 
ACGATGGCTTCGATGAAGCCAAGGAGATTTATCTTTTTTTCCAAGCTTTTAGCCCGTGCT 
CTAATCTAAGATATTAAGGACGAAGAGCTCTGCACCTAAAAGATACAAAGTTCTCGTCAG 
CTTTGTTTTATTTACATAACTTATCTTATGTAACTCTATTCTTTGTTATAAGTTTTTCGG 
ATTGCATCTTTGATACTTTCAACTGTTGGAATCATTGCACATTTTTAGGTTTTGCGCATA 
AGGCATCGGCACATCTTCTCCTGCACAACGGCGGATTGGTGCATCTAGATAGTCAAATGC 
TTCTGATTCTGAAATAATAGCTGAAATTTCACCGATATAGCCACTTGTTTTGTGGGCATC 
GTTGACCAGAACAACCTTACCAGTCTTCTTCACTGAGTTTATGATGATATCCTTATCAAG 
CGGAACAAGGGTACGTGGGTCAACAATTTCAACTGAAATTCCTTCTTCAGCTAATTCTTC 
AGCAGCTTGAACCACACGGCGAAGCATTTTTCCATAAGTGACAACTGTTACATCCGTTCC 
TTGGCGTTTGATTTCACCAACCCCAAGTGGAATTGTGTAGTCTGGATCAACTGGCACTTC 
CCCTTTTTGGTTAAATTCTGACTTGTACTCAAGTATAATAACTGGGTTGTTATCACGGAT 
AGAAGACTTAAGCAGGCCTTTCATGTCCGCAGGTGTTCCAGGTGCCACAACCTTAAGCCC 
TGGAATGTGAGTAAACCAAGACTCTAGAGATTGTGAGTGCTGGGCGGCAGAGCCAACTCC 
GTTACC AGC TGC AC AACGAAC AGTC ATTGGAAC C TGAC CTTTAC C ACC AAAC ATGTAACG 
TGTTTTAGCAGCTTGGTTGACGATATTGTCCATGGCAATAACAGAGAAGTCCATGAAGGT 
CATATCGACGATTGGACGAAGTCCTGTCATGGCTGCTCCTGCTGCAGCTCCAGAGATGGC 
AGCTTCAGAAATCGGACAGTCACGGACACGTTCTGGACCAAATTCTTCAAGCATTCCAAC 
AGAAGTACCGAAGTCTCCTCCGAAGACACCGACGTCTTCTCCCATCAAGAACACATTTTC 
ATCGCGAACGCATTTCCTCAGACATAGCAAGGATAATGGTGTCACGGAAGGACATTGTTT 
TTGTTTCCATTTTATCTCTTTCTCCTTAGTCTGCGTAAATATCTTCAAAGGCTGATTCAA 
GCGGTGGGAATGGGCTTTCCTCTGCAAATTTAACAGAAGCTTCTACTGCTTCCTTTACTT 
GCGCTTGGATTTCTTCCAATTCTTCGGCACTTGCAATGTTATTTTCAATAAGGTAATTGC 
GGAGGTTTTCGATTGGATCTTTTTGTTTCCACAATTCCACTTCTTCACGCGTACGATATT 
TACCAGGGTCAGATGATGAGTGACCGAGCCAGCGATAAGTTACACTTTCAATCAAGACTG 
GACCATTGCCACTGCGAACATGGTCTATAGCTTTCTGAAATCCTTCATAGACATCGATGA 
CATTGTTACCGTCTTCGATGAACATTCCAGGAATTCCATAAGCGGCGCTACGTTGATGGA 
TATGTTCTATATTGGTCATTTTCTTGATATCCGCAGAGATACCGTAACCGTTGTTAATGC 
AATAGAAAATGACTGGCAGGTTCCAGATAGAAGCCATGTTCACTGCTTCGTGGAAAACAC 
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CTTCATTGGTCGCACCATCTCCAAAGAAGCAGACAACGATTTTACCGGTATTTTGCAXTT 
GCTGACTGAGGGCTGCACCGACAGCGATCCCCATACCACCACCTACGATACCATTGGCAC 
CAAGGTTCCCAGCATCAAGGTCAGCGATATGCATAGATCCACCTTTCCCTTTACAGGTTC 
CAGTGTATTTACCAAGGATTTCAGCCATCATTCCGTTGAAGTCAATCCCTTTAGCAATAG 
CTTGCCCGTGTCCACGGTGGTTTGAGGTAATCAGATCATCTGGATTGAGAGCTACATAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 439 1365 R 309 aa 

> 3858264-1 ORF translation from 439-1365, direction R 
VTPLSLLCLRKCVRDENVFLMGEDVGVFGGDFGTSVGMLEEFGPERVRDCPISEAAISGA 
AAGAAMTGLRPIVDMTFMDFSVIAMDNIWQAAKTRYMFGGKGQVPMTVRCAAGNGVGSA 
AQHSQSLESWFTHIPGLKWAPGTPADMKGLLKSSIRDNNPVIILEYKSEFNQKGEVPVD 
PDYTIPLGVGEIKRQGTDVTWTYGKMLRRWQAAEELAEEGISVEIVDPRTLVPLDKDI 
I INSVKKTGKWLVNDAHKTSGYIGE I S AI I SESEAFDYLDAP IRRCAGEDVPMPYAQNL 
KMCNDSNS* 

Description : 

2-OXOISOVALERATE DEHYDROGENASE BETA SUBUNIT (EC 1.2.4.4) 
(BRANCHED- CHAIN ALPHA -KETO ACID DEHYDROGENASE COMPONENT BETA 
CHAIN (El)) (BCKDH El -BETA) . - BACILL US SUBTILIS . 

Assembly ID: 3858610 
Assembly Length: 107 8bp 

> 3858610 Strep Assembly -- Assembly id#3858610 

CTAACCCTNGACGGGGCCGCTATCATCAGTCAAACAGCTAAAAATCTTGTCTGCAAAAGT 
CTCGATTAACTGAGCTTTTACAAAAGCCGTATTTCCTGGAATAACTTGGAGATTGATCAT 
CTTATCCATCAATTCAGCCGATTCGATATTGTCTTCAGCCAGTTGCAGACTTTTTACGAT 
TGATTTTGGCAATTCGTAGACATAGGTGTTGTCTCTCAAAGGAATTTTGACAATACCTAA 
CTCTTTGATATCTCGGGATACCGTCGCCTGAGTGGCAGTGATACCTGCTTCTTTCAAATG 
TTCTACAATTTCTTCTTGCGTGCCGATTTGATAATCTGTCACCAATCTTCTAATTTTTTC 
AAGTCTCTCTTTTTTATTCATTTTTAAATTGACTATGCGCCCTCTCTACTGCTTCTTTAA 
TCTCAGCAAGAATCTGATTGCTTGCTGACTTTTCTTTTTTCAAATACACTAAAAATTCAA 
TATTTCCATGTCCACCTTGGATGGGAGAAAAGTCCAAGCCAAGGACTGAAAAACCTGCCT 
CTACTGCCATAGCTGTTACAGATTCAAGGACATTCTGATGAATCTTAGCATCTCGAATAA 
TTCCATTTTTCCCAATCTGCTCACGTCCTGCCTCAAACTGAGGTTTGACAAGTGCTACCA 
CCTGACCTTGATCAGCCAAGACACGGTGCAAGGCTGGCAAAATCAGACTAAGGGAAATGA 
AACTCACATCAATACTGGCAAAGCTCGGCTCCTGCTCGAAATCAGTCTTTTCAGCATAGC 
GGAAATTGAACTGCTCCATGCTGACAACTCGTGGGTCTTGGCGTAATTTCCAAGCCAACT 
GATTGGTACCAACATCGACTGCAAAGACCAACTTGGCACTATTCTGTAGCATGACATCGG 
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TAAAACCTCCAGTAGAGGCCCCGATATCAATCGTAGTCGCGCCATCCACCGACAAATGAA 
AGACCTGCAAGGCCCTTTTCCAGTTTCAAACCACCACGGCTGACATACTTGAGTTTCTCC 
CCCTTGAGTTTTAATTCGGTGTCATCTGGAATTTCTCTCCTGGCTTGTCAAACCGTTC 

ORF Predictions: 

ORF # Start End Direction Length 



1 374 949 R 192 aa 

> 3858610-2 ORF translation from 374-949, direction R 
VDGATTIDIGASTGGFTDVMLQNSAKLVFAVDVGTNQLAWKLRQDPRWSMEQFNFRYAE 
KTDFEQEPSFASIDVSFISLSLILPALHRVLADQGQWALVKPQFEAGREQIGKNGIIRD 
AKI HQNVLES VTAMAVE AGF SVLGLDF S P I QGGHGNI EF LVYLKKEKS ASNQ I LAE I KE A 
VERAHSQFKNE* 

Description : 

cytotoxin/hemolysin ORF2 tly - Serpula hyodysenteriae 

Assembly ID: 3858716 
Assembly Length: 92 8bp 

> 3858716 Strep Assembly Assembly id#3858716 

ACTTTCCTGACCTCTGTTTCCAAATAATCTTCCAAATGGACAGAGATCTACCGTTGTTTG 
CATCGATAGCTGAGGTCTTTTTTAGAAAATACCATCACTTTTAGAAAATATAAACACATT 
TTTCGGATAAGATTAAGGTTAAAAGCAGCTCGTTTATCCAGGGTCTGATGATGGTCTTCA 
CGATAAACCACATCCAATAACCAATGCATACTTTCTGCTGACCAATGACCTCGAACACTA 
TGGCAAAAGGTCATCAACATCAAGCTTAAAGTTAAAGATAAAATAGCGAACGTCTTGACT 
TGTAATACCATCTCTATCAATAGTATTACGAGTCATTCCAATTCCACGCAATTTATGCCA 
TTTGGGATGGTTTTGACACAACCACTTAACATCAGAAGACACCCAGTATTCTCGAACTTC 
AATCTATCCTCTTTCTATATTCTAACTGAAAGGACAATTCAATGATTCATTTAATAATGA 
TTAGCGCCATTGCTCTAGCCATTGGAATTGGTTACCGCACCAAAATCAATATTGGCCTGC 
TGGCTATTGCTTTTTCTTACCTCATCGCAACCACTCTCATGGGATTAAGTCCCAAAGAAC 
TTCTTCATTTTTGGCCAACCTCACTCTTTTTTACCATTTTTAGCGTCTCTCTCTTTTATA 
ACGTTGCAACAACTAACGGTACTCTTGATGTTTTGGCTCAACACATTCTCTACCGCACAC 
GCACCCACCCTAACGCCCTCTACATGATTTTATACCTGATGGCAACCCTTTTGTCTGCTT 
TAGGTGCTGGATTTTTCACTACTATGGCCGTTTGCTGTCCTCTAGCGATTACCCTCTGTC 
AAAAAGCGGACAAACACCCTTTGATTGGAGTCAAAGCGTCAATGGGAACTTCAGGAAGGG 
TAATTTGATAACCAAAGGAATAAAATTT 

ORF Predictions: 

ORF # Start End Direction Length 



1 238 402 R 55 aa 
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> 3858716-1 ORF translation from 238-402, direction R 
VSSDVKWLCQNHPKWHKLRGIGMTRNTIDRDGITSQDVRYFIFNFKLDVDDLLP^ 

Description : 
unknown 

Assembly ID: 3859124 
Assembly Length: 847bp 

> 3859124 Strep Assembly -- Assembly id#3859124 

AAAAACGCACCATATCAAAAACTAAAAAGTTTGATATCATGCGTCATGTCTTAAACTAAT 
TGACTATACTTTCTATTCAAATGAGCTTTTAACCAATTGATTGAGCCAATCCACTCTTAA 
AACCAAAGGAGCAATTTCTCGGCTTAGCTGACTCTTCTCGGAATCTGAACCATGTACAAC 
ATTTTGGATAATCTCATTTTCTCCAGCAGCTTTTGCAAAATCACCTCGAATAGTGCCTGG 
TAAAGCTTCTTCTGGACGAGTTGCACCCATCATGGTCCGCCAAGTTTCGATTACTTTGGG 
AC C AG AAATG AC AC C C A C AAG AAC T G G AC C TG AAG T C ATG AAT T C AC G AAT C G G TGGG T A 
AAAACTCTGACCAACCAAGTCCTGATAGTGCTGGTCAATCAACTCTTCTGAAAACCTGTG 
AACGAAACTCCAATTTTTCGATTGTAAATCCACGTTGTTCGATGCGCTTTAACACTTCAC 
CCACTAGCCCTCTTTTTACACCATCTGGTTTGATGATAAAGAATGTTTGTTCCATACCCG 
TCTCCTTTGTCAGCTTCTTTCTTTTATTTTACCACATCTCGTGGAAAAATGGAGAAAGTT 
TTCAGAAGAGAGAATGAGAGAACCCTCGGGTTCTCTCATTCTCTCTTATTCTACTGTTTC 
TTCCACAGTGTCAACGGCAGTATCCACAACTACTTCTGTTGTTTCTTCATTTCCTTCTTC 
CTCTACTGGAGGATTAAGGTATTCTTCTTCGTTGACAGCATGTGGTTCAAGGTTACGGTA 
ACGGGCCATACCAGTACCAGCTGGGATGATCTTACCGATGAATAACATTTTCCTTTAAAT 
TCCAAGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 73 453 R 127 aa 

> 3859124-1 ORF translation from 73-453, direction R 
VDLQSKNWSFVHRFSEELIDQHYQDLVGQSFYPPIREFMTSGPVLVGVISGPKVIETWRT 
MMGATRPEEALPGTIRGDFAKAAGENEIIQNVVHGSDSEKSQLSREIAPLVLRVDWLNQL 
VKSSFE* 

Description : 

NUCLEOSIDE DIPHOSPHATE KINASE (EC 2.7.4.6) (NDK) (NDP KINASE) 
(ABNORMAL WING DI SCS PROTEIN) ( KILLER-OF-PRUNE PROTEIN) . - 
DROSOPHILA MELANOGASTER (FRUIT FLY) 

Assembly ID: 3859244 
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Assembly Length: 578bp _ 
> 3859244 Strep Assembly -- Assembly id#3859244 

ACAACCTAACTACCGNCTAATTCAGCGCGAACTTCTGCAGTAGCTGCTTCAACAACTTCA 
C G AC G TG AAAG G ATG AAG C GG TT T T C T T TAG C GT T AAC TT C TT TG ATTT TAG TAT C AAAT 
TCTTGACCTACAAAACGCTCAGCGTTACGTACGAAACGAGTATCCAACATTGAAGCTGGG 
ATAAATCCACGAACACCTTCAAATTCTACTGAAAGTCCACCTTTAACGGCACGCGTTCCT 
TTAACAGTAACAACTTCTTCTTCGCGACCAACAAGTTTGTCCCATGCTTTGCGAGCTTCA 
AGGCGTTTTTTAGATGACAAGGTATGTAACTGTATCAGTATCTTTACCAACTACTTGACG 
AAGTACAAGAACATCCAATACTTCTCCTACTTTAACAAAGTCATTGATATCTGCATCACG 
ATCGTTTGTCAATTCGCGAAGAGTCAAGACACCCTTCAACACCAGTTCCCAGAAGAATGC 
AACGTTAGCTTGAGTCGCATCAACTGTCAATACTTCAGCACTAACACATCACCAGTCTCA 
ACTTGACTNACGCTATTGAGCANATCTTCAAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 310 462 R 51 aa 

> 3859244-2 ORF translation from 310-462, direction R 
VLKGVLTLRELTNDRDADINDFVKVGEVLDVLVLRQWGKDTDTVTYLVI* 

Description : 
unknown 

Assembly ID: 3859250 
Assembly Length: 8 8 8bp 

> 3859250 Strep Assembly Assembly id#3859250 

GTAGTTATAGTAGGGGTCGGATTGAAATGCCACNGCGCTTCTTGGAGTTTCTGATACCGT 
TTAAAATAGCGTTGGGCATTCTGGTTGGGAGTCAGAGCCTTATCAAGCGCAATCATGATA 
GGTTGGTTGGTATAGTAGTTGTCTAGGATAACCTGGTTCTTGGTCGTTAGGCACCTGGTG 
GAGGAAGGTTGTCAGCAATTCTCCTTTTTGACGAAATTCTTCAGCGTTGTCTGTCGCCAG 
TAACTATTTTTCCTGTTTTTTGAGTTTGTGTCGGTTTTTCTGAAGTTCATTTTCAACACG 
ACGAATCAGTTCACTGGCCTGCTGTTTGACGCGGTCGCGCTCAGCCTTATCCTTATAGTA 
GGTGTCCAACAAATCAGAAAGATTTGCAAAAGGCTCTCCCACCTGATTTGCAAAAGGAAC 
TGGACTGAAGGAAGTCTCAGTCAAGCATGGCTTGGTTTCCTGATTGAAAAAATTTCGGAA 
AGCGGAAAGTTTTTCACTAACCAGTATCCTTTCCAATTCATTTGCCGTATCGCGTCCCAG 
ACCTTGAAAGAGGCTTTGAAGATTTTTTGCTGTTAGTTCTTGGGTTTGCAGGATTTCAAA 
GAGCTTTTCATCCTTGATAGTAAAAGGATTGAGAGATTCTGTACTTGGCGGAGCGATATA 
GGTCGATCCTGGAAGTAAGGTGCGGTAGCTATTTTGTGAAAAGCCGACGTGTTTGATAAC 
TTCGAGGATTTTATGACTGCTTTTATCCGACCAGTTAGAATATTACTGTGTTTCCCCATA 
ATTTCGATAATCAAGGTAGCCTGGATATGGTCTCCAATCTCGTTTTTATTGGAAACTGTA 
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ATTTCCACAATACGGTCATTTTCCACTTGCTCAATCGACTCAATCAGG 
ORF Predictions: 

ORF # Start End Direction Length 



1 244 402 R 53 aa 

> 3859250-1 ORF translation from 244-402, direction R 
VGEPFANLSDLLDTYYKDKAERDRVKQQASELIRRVENELQKNRHKLKKQEK* 

Description : 

STRFBP5A NCBI gi : 496253 - Streptococcus pyogenes. 
Fibrinogen/Fibronectin binding protein 

Assembly ID: 3859588 
Assembly Length: 513bp 

> 3859588 Strep Assembly -- Assembly id#3859588 

ATCGAATTTTGTTCTTTCATAGAGAGCTACCTGAGTTCTATTCAAGCTCAGGTAGTACTT 
TCTTATAAACTAGACAAACTAACTGTCATTCTACCATCAGATTACAAGACATCATCGTCA 
CTCACCTTGGAATTCAATGTCGTACCCCAATGGGTAATTTTACGGTGGGGTTGAGCTAAA 
ATTGGTCTGTTTTCATAGATTGTTTGCCATCTATTCCATAGTAGGCCCGTCTTTTTCTCA 
ATCTTAACTCGCAGATTTCTCATATTTTCTTTGATTGGGAGGTTGAGGACAAAACCTGCA 
GTCTGGTTGCGACCGTTTCCTTCCCAAGAATGACTACGAACAACTTGGTTTCCATCTTTA 
TCTACTGGAACTTCTTCCCAAGTTATGGAGTAGCGGGCAATGTAAGCTCCACTGTGTTGA 
ATTATCAATGTTTTATCTTTCACAGGGAGTCTGACTGATTGGTTGAACTGGCTTAGAAAC 
TTGTGTCGCCGTTTCAGCATTCGTAGCTATAAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 102 443 R 114 aa 

> 3859588-1 ORF translation from 102-443, direction R 

VKDKTLIIQHSGAYIARYSITWEEVPVDKDGNQWRSHSWEGNGRNQTAGFVLNLPIKEN 

MRNLRVKIEKKTGLLWNRWQTIYENRPILAQPHRKITHWGTTLNSKVSDDDVL* 

Description : 

PNEUMOLYSIN ( THIOL -ACTIVATED CYTOLYSIN) . - STREPTOCOCCUS 
PNEUMONIAE . 

Assembly ID: 3859774 
Assembly Length: 214bp 
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> 3859774 Strep Assembly -- Assembly id#3859774 

ATCGAATTCTAACATGTGCTTCTCCTTCTATTGTTCCTATCTTTAAAATCTACTCCTTCA 
TGCTCCAAGAGCCAAGCTTTCTTTTCCACTCCTGCAGCATAACCTGTCAGACGCTTGCCT 
GCTCCCAACACACGATGACAAGGTACTAGGATAGACCAAGGATTGCGTCCCACTGCTCCA 
CCAATTGCTTGAGCAGAAGCCACTTGCAGGTCTT 

ORF Predictions: 

ORF # Start End Direction Length 



1 9 131 R 41 aa 

> 3859774-1 ORF translation from 9-131, direction R 
VLGAGKRLTGYAAGVEKKAWLLEHEGVDFKDRNNRRRSTC* 

Description : 

GLUTAMATE RACEMASE (EC 5.1.1.3). - ESCHERICHIA COLI . 

Assembly ID: 3860140 
Assembly Length: 1084bp 

> 3860140 Strep Assembly -- Assembly id#3860140 

CTCCAGCAATGGATCCAAGTATGATGGGCGGGATGATGTAAGCTTTCTATAGAAAACACC 
TTATAAAAAACACGAAAGGAGGGAATGACTAACCCTTCTTTTTATAATATTCACTTCTAA 
G ATTG ATGGTG AGC TC TC C T AAC TT AT ATG AT AAAAT AAG AC T AG AGG AAAGG AG AAGAA 
CATGATCGATGTACAAGAAATTCTGTGCAAGATGACCCCCAATCAGAAGATTAATTATGA 
CCGTGTCATGCAGAAAATGGTACAAGCATGGGAAAAAAATGAGTAGCGGCCAACCATTCT 
CGTGCATGTTTGCTGTGCCCCTTGTAGTACCTATACACTAGAATATTTGACCAAGTATGC 
AGATGTGACCATCTATTTTGCCAATTCTAATATCCATCCCAAGGCAGAATACCATAAGCG 
GGTCTATGTCACCAAGAAATTTGTTAGTGATTTTAATGAGCAGACAGGAAATACGGTTCA 
GTACCTAGAAGCTCCCTACGAACCCAATTAATACCGAAAACTAGTTAGGGGGCTAGAGGA 
GGAGCCCGAAGGTGGCGACCGTTGCAAGGTTTGTTTTGACTACCGACTGGATAAAACAGC 
GCAAGTGGCTATGGACTTGGGCTTTGACTACTTTGGTTCAGCCTTGACCATCAGTCCTCA 
TAAGAATTCTCAAACTATCAATAGCATCGGAATCGATGTGCAAAAAATTTACACGCCCCA 
CTATCTTCCCAACGATTTCAAGAAAAATCAAGGCTACAAACGTTCAGTAGAGATGCGTGA 
GGAGTATGATATCTATCGTCAATGTTATTGTGGCTGCGTCTATGCAGCCCAAGCCCAGAA 
TATTGACCTGGTTTAAGTTGAGTAGGACGCCACAGCATGCTTGCTGGATAAGGATGTTGA 
GAAAGACTATTCTCATATCACATTTATAGTAGATTGAAACTAGAATAGTACACCTTTACT 
TCTCAAACATTGTTAGAAATCGATTCGGCTGTCCTTATTTCATTTTAATATACTGGTACG 
AAATTAGATATATCAATGATAACTTGCCTCAAGGTAGGTTTTTTGATAGTAGAAAAGCGA 
TAGA 

ORF Predictions: 
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ORF # Start End Direction Length 



1 302 511 F 70 aa 

2 605 -.856 F 84 aa 

> 3860140-1 ORF translation from 302-511, direction F 
VHVCCAPCSTYTLEYLTKYADVTIYFANSNIHPKAEYHKRVYVTKKFVSDFNEQTGNTVQ 
YLEAPYEPN* 

Description : 
unknown 

> 3860140-2 ORF translation from 605-856, direction F 
VAMDLGFDYFGSALTISPHKNSQTINSIGIDVQKIYTPHYLPNDFKKNQGYKRSVEMREE 
YDIYRQCYCGCVYAAQAQNIDLV* 

Description : 
unknown 

Assembly ID: 3860206 
Assembly Length: 1124bp 

> 3860206 Strep Assembly -- Assembly id#3860206 

ATCGAATTCATTGACTGCCTGAAAAGACTTCAACTCGTCTGCCTGATAACCGAAAGACTT 
GGTTACTTTGATACCTGATACGGACTCCTGTACCTTGTTATTGAGTTCAGAAAAAGCAGC 
TTGGGATTCGCCAAAGGCCTTATGAGTCTTTCTCCCTAGGCGACTAGTCGTATAGGCCAT 
GAAAGGTAGGGGGAGAATGGCAACAAGAGTCATCTGCCATGAGATGCTAAAGAGCATGGT 
CAACAAAGTCACCAGAGCCGTGATAGAGGCATCCACCGCAGACATGACACCGCCACCTGC 
TAAACGAGTCAAGGAATTGATATCATTGGTTGCGTGTGCCATCAGATCACCCGTCCGATA 
GGTTTGATAAAAGGCTGACGACATTTTTGTGAAATGCTTAAACAAGCGAGACCGCATGAT 
CTGTCCCAAGCAATAAGAGGTCCCAAGGATATACATACGCCACACATAGCGCAAATAGTA 
CATACCAAAGGCTGCAAGTAGCAAGTAAAATAGGCTAAGAAGGAGGTCCTGCTGGGTTAA 
TTGCCCCGATGTGATGGCATCAATAACCCGCCCCATAACCATAGGAGGAATGAGATTGAG 
GACGGAAACCAAGACCAGGGCCACAATCCCGACTAGATAACGGCGTTTTTCTAACTTGAA 
AAACCACCAAAATTTTTGAATAATGGACATAAAATCCCTTTCTGGATTGCAAATAGAAAC 
CTGAGGCCAATACTCAATGGAAAATCAAAGAGCAAACTAGGAAACTAGCCGCAGGCTGCT 
CAAAGCACTGCTTTGAGGTTGTAGATAGAACTGACGAAGTCAGTAACCTACATACGGCAA 
GGCGACGTTGACGCCGTTTGAAGAAATTTCCGAAGAATACAAGACCCCAGGTTTTTCTTA 
TTTATAAGTTACCACTGTAACAGCACCCTTGTCATATTCAGCAATAAAGATATTGGCTAC 
ATTGTCATGCCCTTGTTTACTGAGGTTATCAAGCAACCACTCCTCGCTACGAACAATCGA 
TCCCAAGACATCTACTTGAATCACACCGTCAGTCACAACTGGATACTTAGGATTTTCATC 
TCCCATTTGCACAACGATGAGTTGCCCATTTTGCTCTTGCACAG 
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ORF Predictions : 

ORF # Start End Direction Length 



1 898 1-056 R 53 aa 

> 3860206-2 ORF translation from 898-1056, direction R 
VTDG VI QVDVLG S I VR S EEWLLDNL SKQGHDNVAN I F I AEYDKGAVTWTYK * 

Description : 
unknown 

Assembly ID: 3860270 
Assembly Length: 1242bp 

> 3860270 Strep Assembly Assembly id#3860270 

TTACCTTCATTGCAGCCATTATTGGTTCTTGTGTCAGCCAGATTTTAAGTATTCTTTATA 
AGACACCTGCTGTGGTCTTTATCTTGGCCATTTTGGCACCGCTGGTTCCAGGTTATCTCT 
CCTACCGAACAACTGCCTTTTTTGTGACAGGGGACTATAATAAAGCACTGGCAAGTGCGA 
CCTTGGTTGTCATGTTGGCTTTGGTAATCTCTATTGGAATGGCTAGCGGAACAGTGATTC 
TCAGACTGTATCATTATATAAAAACACATCGAGTATCGTAGACTTTACAGAAATAAAAGA 
AT T T T C TG AAAAATG AG AT AAAT AAAT T AA C AAC G C T T T C T AT ATGTG C G AG AAT AC C G C 
ACTTATGAAGAAATTGCGGCTGATTTTGGTATCCACGAAAGCAACTTAATCCGTCGGAGC 
CAATGGGTTGAAGTAACTCTTGTTCAAAGTGGTGTTACGATTTCAAAAACTCATCTTAGT 
GCTGAGAATACGGTGATTGTGGATGCAACAGAGGTAAAAATCAATCGCCCTAAAAAACAA 
TTAGCGAATGATTCTGGTAAAAAGAAATTTCACGCTATGAAGGCTCAGGCGATTGTCACA 
AGTCAAGGGAGAATTGTTTCTTTGGATATCGCTGTGAACTATTGTCATGATATGAAGTTG 
TTCAAAATGAGTCGCAGAAATATCGGACAAGCTGGAAAAATCTTGGCTGATAGTGGTTAT 
CAAGGGCCCATGAAGATATATCCTCAAGCACAAACTCCACGTAAATCCAGCAAACTCAAG 
CCGCTAATAGCTGAAGATAAAGCTTATAACCATGCGCTATCCAAGGAGAGAAGCAAGGTT 
GAGAACATCTTTGCCAAAGTAAAAACGTTTAAAATGTTTTCAACAACCTATCGAAATCAT 
CGTAAACGCTTCGGATTACGAATGAATTTGATTGCTGGCATTATCAATTATGAACTAGGA 
TTCTAGTTTTGCAGGAAGTCTATTATTTTCCTTATTGTCTGTAAGTCTACTGACCTTGTT 
GTTTATCCCAGTCATGGTTTCTAGTTCGGGCTCAGAGTTTCAAAGTGGATGGCAAGAGCA 
TCAATTGATTGCTGAGAAGGTTAGTAAAACACTTGACAAGACATTTGATAAGGATGTCAG 
AAAAATTCCGACCAGTCAGTTTTATCAAAAATTTGTAGATGAGATGGGAAGGATTTACTC 
AGGAAATTTGATCCTCCCAGGAGCTGATAACTGTGAATGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 346 966 F 207 aa 

> 3860270-1 ORF translation from 346-966, direction F 
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VREYRTYEEIAADFGIHESNLIRRSQWVEVTLVQSGVTISKTHLSAENTVIVDATEVKIN 
RPKKQLANDSGKKKFHAMKAQAIVTSQGRIVSLDIAVNYCHDMKLFKMSRRNIGQAGKIL 
ADSGYQGPMKIYPQAQTPRKSSKLKPLIAEDKAYNHALSKERSKVENIFAKVKTFKMFST 
TYRNHRKRFGLRMNLIAGIINYELGF* 

Description : 

ISL2 protein - Lactobacillus helveticus (Probable transposase) 

Assembly ID: 3860438 
Assembly Length: 157 5bp 

> 3860438 Strep Assembly Assembly id#3860438 

GTGATGGGGCCTCAGGGAAATGGTTTTGACTTGTCTGACCTTGATGAGCAGAATCAGGTT 
CTCCTTGTTGGTGGTGGGATTGGTGTTCCACCCTTGCTTGAGGTGGCCAAGGAATTGCAT 
GAACGTGGAGTGAAAGTAGTGACAGTCCTCGGTTTTGCTAATAAGGATGCTGTTATTTTG 
AAAACGGAATTGGCTCAGTATGGTCAGGTCTTTGTAACGACAGATGATGGTTCTTATGGC 
ATCAAGGGAAATGTTCCGTTGTTATCAATGATTTAGATAGTCAGTTTGATGCTGTTTACT 
CGTGTGGGGCTCCAGGAATGATGAAGTATATCAATCAAACCTTTGATGATCACCCAAGAG 
CCTATTTATCTCTGGAATCTCGTATGGCTTGTGGGATGGGAGCTTGCTATGCCTGTGTTC 
TAAAAGTACCAGAAAGCGAGACGGTCAGCCAACGCGTCTGTGAAGATGGTCCTGTTTTCC 
GCACAGGAACAGTTGTATTATAAGGAGAAAATTATGACTACAAATCGATTACAAGTGTCT 
CTACCTGGTTTGGATTTGAAAAATCCGATTATTCCAGCATCAGGCTGTTTTGGCTTTGGA 
CAAGAGTATGCCAAGTACTATGATTTAGACCTTTTAGGTTCTATTATGATCAAGGCGACA 
ACCCTTGAACCACGTTTTGGGAATCCAACTCCAAGAGTGGCAGAGACGCCTGCTGGTATG 
CTCAATGCAATTGGCTTGCAAAATCCTGGTTTAGAGGTTGTTTTGGCTGAAAAGCTACCT 
TGGCTGGAAAGAGAATATCCAAATCTTCCTATTATTGCCAATGTAGCTGGTTTTTCAAAA 
CAAGAGTATGCAGCTGTTTCTCATGGGATTTCCAAGGCAACTAATATAAAAGCTATCGAG 
CTCAATATTTCTTGTCCCAATGTTGACCACTGTAATCATGGACTTTTGATTGGTCAAGAT 
CCAGATTTGGCTTATGATGTGGTGAAAGCAGCTGTGGAAGCCTCAGAAGTGCCAGTTTAT 
GTCAAATTAACCCCGAGTGTGACCGATATCGTTACTGTCGCAAAAGCTGCAGAAGATGCG 
GGAGCAAGTGGCTTGACTATGATCATACTCTGGTGGGATGCGCTTTGACCTCAAAACCAG 
AAAACCAATCTTGGCCAATGGAACAGGTGGAATGTCAGGTCCAGCAGTTTTCCAGTAGCC 
CTCAAACTCATCCGCCAAGTAGCCCAAACAACAGACCTGCCTATCATTGGAATGGGGGGA 
GTGGATTCGGCTGAAGCTGCCCTAGAAATGTATCTGGCTGGGGCATCTGCTATCGGAGTT 
GGAACAGCTAACTTTACCAATCCTTATGCCTGCCCTGACATCATCGAAAATTTACCAAAA 
GTCATGGATAAATACGGTATTAGCAGTCTGGAAGAACTCCGTCAGGAAGTAAAAGAGTCT 
CTGAGGTAAACTGCAATCAATCTGTTCTTGATTTTTTATTAGTTTGTAATATGAATTTAG 
GAGAATTTTGGTACAATAAAATAAATAAGAACAGAGGAAGAAGGTTAATGAAGAAAGTAA 
GATTTATTTTTTTAG 

ORF Predictions: 

ORF # Start End Direction Length 
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1 1 276 F92aa _ 

2 460 • 1128 F 223 aa 

> 3860438-1 ORF translation from 1-276, direction F ~. 
VMGPQGNGFDLSDLDEQNQVLLVGGGIGVPPLLEVAKELHERGVKWTVLGFANKDAVIL 
KTELAQYGQVFVTTDDGSYGIKGNVPLLSMI * 

Description : 
unknown 

> 3860438-3 ORF translation from 460-1128, direction F 
VKMVLFSAQEQLYYKEKIMTTNRLQVSLPGLDLKNPIIPASGCFGFGQEYAKYYDLDLLG 
S IMIKATTLEPRFGNPTPRVAETPAGMLNAIGLQNPGLEWLAEKLPWLEREYPNLPI I A 
NVAGFSKQEYAAVSHGISKATNIKAIELNISCPNVDHCNHGLLIGQDPDLAYDWKAAVE 
A S E VP VY VK L T P S VTD I VT VAK AAE DAG A S G L TM 1 1 L WWD AL * 

Description : 

DIHYDROOROTATE DEHYDROGENASE (EC 1.3.3.1) ( DIHYDROOROTATE 
OXIDASE) (DHODEHASE) . - BACILLUS SUBTILIS . 

Assembly ID: 3860544 
Assembly Length: 776bp 

> 3860544 Strep Assembly -- Assembly id#3860544 

C T AAG AT AT C AG AAT AAC AAC G AAAT C G AAG C AT T AAAAAC AAAT AT T AC TT C T AAG AAT 
AGCGAGATTGATAGTCAACAAAGCAATATTAAGGATATGACCGTACCTATAATGATCCAA 
CTTCTCAGGCTTATAATATTTATGCTCAATTAATTAGTGAGTTAGGTACTGCTCGTTCAA 
ACAACAATAAAAGTATTACAGAGCTTGAGGCTAATCTTGGAGTGGCAACAGGTCAAGATA 
AAGCTCATAGTATATTAGCGTCAAATGAAGGTACTCTGCATTATCTGGTACCTTTGAAAC 
AAGGAATGTCTATTCAGCAGGGGCAAACGATAGCAGAAGTTTCAGGGAAAGAAAAAGGTT 
ACTATGTAGAGGCTTTTGTACTTGCGAGTGATATTTCTCGTGTTTCAAAAGGAGCAAAAG 
TTGATGTTGCTATTACTGGTGTGAATAGTCAAAAATATGGAACACTAAAGGGACAAGTCA 
GACAGATTGATTCAGGAACAATTTCCCAAGAAACGAAAGAGGGGAATATTAGCCTCTATA 
AAGTCATGATAGAATTAGAAACCTTAACTCTAAAACATGGAAGCGAGACGGTCATACTCC 
AAAAGGATATGCCAGTTGAAGTGCGGATTGTCTATGATAAAGAAACCTATCTTGATTGGA 
TTTTAGAAATGTTAAGTTTCAAGCAATAATTGGTTTTAAACCTTAGGTAACCTATAAAAA 
CAAATAAGGTAGAGAAAGGATATTTTATCTAAGTTAGCTCACATTACTGCCATTCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 222 689 F 156 aa 
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> 3860544-1 ORF translation from 222-689, direction F _ 
VATGQDKAHSILASNEGTLHYLVPLKQGMSIQQGQTIAEVSGKEKGYYVEAFVLASDISR 
VSKGAKVDVAITGVNSQKYGTLKGQVRQIDSGTISQETKEGNISLYKVMIELETLTLKHG 
SETVILQKDMPVEVRIVYDKETYLDWILEMLSFKQ* 

Description : 
unknown 

Assembly ID: 3860558 
Assembly Length: 1487bp 

> 3860558 Strep Assembly -- Assembly id#3860558 

CTGGCCTTTCTCCACCAAAATTGTTCCTTGAGGGAAGGAAGTCAGAACACTAGCCGTTGC 
ATCTTCCTTTTGCTTTTCAATCGTAATTCCAGATAATTTTTCCCATTCTTTTTGGTGACC 
CCGGGAGGCAGGATTGAATGGCTTGAGGGAAATGACAAACTTGTCCTAGCAAGAATGGTC 
AAGGCACCTCCGTCTACAATCAAAATCTGATTTGGGCTTAAATTAACAAAGACCTGTTTT 
ACTAGATTTTCTCCAGAAGCATCGTCTCGTAAACCAGGCCCCAGCAAGATAACTTCTGCC 
TTCTCCAATTGCTCTTTTAACAATTGCTGGTCTTGAAGAGAAAAGGCCATAGGCTCAGGT 
AAATGGCTGTGCAGAGCCGGGATATTTTCCCTGTCCGTTCCAACGGTCACCAATCCTGCA 
CCGCTTTTTACAGCTGCTAAAGCAGCCATGATGATGGCACCTCCATAAGGATAAGTACCA 
CCAAGCAGCAGCAGACGACCATAATCTCCTTTATGACTTGAACGAGAACGTTCAATAATA 
ACTTTTTCTAGTAAGGTTTGATTAATCACTTTCATCCTTTTTCCCTCTCACTTTTATTAT 
ACAACAAAAAGGAGACGCAGACCTCCTTTTGTAATCTTATATCTAAAATTTAATATTCAT 
TTCTGCCATTTTAGATATAGCTATAGAAAATACACTCTATTAATCGAATGTTTCTCTTAT 
TTTCTATCCAATGTCCGAAGTGCTGCTTGATAAGTTTGCTCCATCAGCATGGTAATGGTC 
ATAGGACCGACACCTCCAGGGACTGGCGTGATATGGCTAGCAAGTGGTGCAACTGCCTCA 
TAATCAACATCTCCACAGAGCTTCCCATTTTCATCTCGGTTCATCCCAACGTCAATGACA 
ACCGCACCTGGTTTGACAAAGTCAGCAGTCACAAACTTGGCGCGGCCGATTGCGACTACA 
AGAATATCTGCTTTAGCAGCCACCTTGGCAAGATTATGAGTTCGTGAGTGGGCCAAGGTT 
ACTGTCGCATTTTTAGCCAAAAGAAGCTGAGCCATAGGTTTTCCAACGATATTTGAACGA 
CCGATTACGACCGCATTTTTACCTTCCAAGTCAATCCCATATTCATGAAACATTTCCATA 
ATTCCTGCAGGTGTCGAGGGAATCATGACTGGATGTCCAGACCAAAGACGTCCCATGTTT 
AGGGGATGGAAACCATCCACATCCTTTTCTGGGTCAATGGCTAATAAAACCGCCTCTTCA 
TCGATATGTTTTGGTAATGGCAACTGGACCAAAATCCCATGCCAAGCTGGATCCTGATTA 
TATTTAGCAATCAGGTCTAACAATTCCTCTTGAGTAATGGTCTCTGGAACTCGCACTACT 
TCGGTACGGGAACCAGCCGCAAGAGCTGACCTCTCCTTGTTGCGAACGTTAAACTTGGCT 
GGCTGGATTATCCCCAACCAAAATCACTACCAAACCAGGCACTAGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 717 1376 R 220 aa 
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> 3860558-2 ORF translation from 717-1376, direction R _ 
VRVPETITQEELLDLIAKYNQDPAWHGILVQLPLPKHIDEEAVLLAIDPEKDVDGFHPLN 
MGRLWSGHPVMIPSTPAGIMEMFHEYGIDLEGKNAWIGRSNIVGKPMAQLLLAKNATVT 
LAHSRTHNLAKVAAKADILWAIGRAKFVTADFVKPGAWIDVGMNRDENGKLGGDVDYE 
AVAPLASHITPVPGGVGPMTITMLMEQTYQAALRTLDRK^ 

Description : 

5 , 10-methylene-tetrahydrof olate dehydrogenase (folD) homolog - 
Haemophilus infl uenzae (strain Rd KW2 0) 

Assembly ID: 3860568 
Assembly Length: 163 4bp 

> 3860568 Strep Assembly -- Assembly id#3860568 

CGTGCCTTGGCCAATGATCCAAAAATCTTGATTTCAGACGAGTCGCTTCAAATTTCGGCC 
CCTGGACCCTTAAGACCAACCCAAGCAGATTTTGGCCCTTGGTTGCAAGATTTGAACCAA 
AAATTAGGCTTGACTGTTGTCCTGATTACGCATGAAATGCAGATTGTCAAAGACATTGCC 
AACCGTGTTGCAGTTATGCAGGATGGGCATTTGATTGAAGAGAGTAGTGTGCTTGAAATC 
TTCTCAGACCCTAAACAACCTTTGACTCAAGACTTTATCTCAACAGCTACAGGTATTGAC 
GAAGCCATGGTCAAAATCGAGAAGCAAGAAATCGTGGAACACTTGTCTGAAAACAGTCTC 
TTGGTGCAACTCAAGTACGCTGGATCTTCAACAGACGAGCCACTTTTGAATGAATTGTAC 
AAGCATTATCAAGTAATGGCTAATATTCTCTATGGGAATATCGAAATCCTCGATGGTACT 
CCTGTTGGAGAATTGGTGGTGGTCTTGTCAGGTGAAAAAGCAGCGCTGGCAGGTGCTCAA 
GAAGCCATTCGTCAAGCAGGCGTACAGTTAAAAGTATTGAAGGGAGGACAGTAAGATGGA 
ATCATTGATTCAAACCTATTTACCAAATGTCTATAAGATGGGTTGGTCTGGTCAGGCAGG 
CTGGGGAACAGCTATCTACCTAACCCTCTATATGACAGTTCTTTCCTTCATTATCGGAGG 
CTTCTTGGGGCTAGTGGCAGGTCTCTTTCTCGTCTTGACAGCGCCAGGTGGTGTCTTGGA 
GAATAAAGTCGTATTCTGGATTTTAGACAAAATTACCTCAATTTTTCGTGCGGTTCCCTT 
TATCATCCTCTTGGCAATCTTGTCACCACTTTCTCACTTGATTGAAAAAACAAGTATCGG 
GCCAAATGCAAGCCCTTGTCCCACTTTCTTTTGCAGTCTTTGCCTTCTTTGCCCGTCAGG 
TGCAGGTTGTCTTGGCTGAAATGGATGGCGGTGTCATTGAGGCGGGCTCAAAGCGAGCGG 
AGCGACTTTCTGGGACATCGTGGGTGTTTACCTATCAGAAGGTCTTCCAGATTTGATCCG 
TGTGACGACTGTGACCTTGATTTCCCTTGTTGGGGAAACAGCTATGGCCGGTGCGGTTGG 
AGCTGGTGGTATCGGTAACGTAGCCATCGCTTATGGATTTAACCGCTACAATCACGATGT 
GACCATCTTGGCAACCATCGTTATCATTTTGATTATCTTTGCAATCCAATTCTTAGGAGA 
TTTCTTGACTAAGAAATTGAGCCATAAATAAAAAAGAGCCGTGTGGCTCTTTTTAACTGA 
TCAGATTTTCTGGGCAAATTTTTTACTCAAGGCTTGTCCAATCAAGGCACCCACTAGGGC 
TCCGATGACAATACTTGCGATAAATAGAAGGACAGTTCCAGGGTTTGGAGCGACCATGAT 
GCGGTCGATATATTCTTGGGATTTTCCTCTTGCCAGAAGAGTAGCCATATAGGCTTTGGG 
CGCAATCCACATAAGCAAGATTGGTCCTGTTGTACTAAAGGCGAAAATAATGAAAGAAAG 
GAAGTTCTTTGTTTTGTCCTTGTATTTTCCTAAATGAGCTACTCCATCTGCTAGGAGGCC 
ACAGATAATTCGAT 
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ORF Predictions: 

ORF # Start End Direction Length 



1 1040 1291 F 84 aa 

> 3860568-3 ORF translation from 1040-1291, direction F 

VG VYL S EGL PDL I RVTTVTL I S L VGET AMAGAVGAGG I GNVAI AYGFNRYNHDVT I L AT I 
VIILIIFAIQFLGDFLTKKLSHK* 

Description : 
unknown 

Assembly ID: 3860582 
Assembly Length: 1087bp 

> 3860582 Strep Assembly -- Assembly id#3860582 

GGAATCATGATGATGTCACTGCTAAATGGTTTCTTAGAAAAAATATTTCCTGAGCGCTTA 
CAGATTAGTTTGGGCTTGCTGATTTTATCATTGAGCGGTACAGCTCCCTTCTGGTACCAA 
GCCTATCCCTTTGTCTTTGGAACACGGCTTCTCTTTGGTTTGGGTCTTGGGATGATCAAT 
GCCAAGGCCATTTCTATTATCAGTGAACGCTACCAAGGAAAAAGGCGAATTCAGATGTTA 
GGGCTACGCGCTTCTGCAGAGGTCGTTGGAGCTTCTCTCATTACCTTGGCCGTCGGTCAA 
GTTGTTGGCCTTTGGTTGGACAGCTATCTTTCTAGCCTATAGTGCTGGATTTTTGGTGCT 
GCCCCTTTATCTGCTCTTTGTCCCTTATGGAAAATCAAAGAAAGAAGTCAAGAAAAGAGG 
GAAGGAAGCAAGTCGTTTAACTCGAGAAATGAAAGGCTTGATTTTTACCTTAGCTATCGA 
AGCGGCAGTTGTAGTTTGTACCAATACAGCTATTACCATCCGTATTCCAAGTTTGATGGT 
GGAAAGAGGATTGGGGGATGCCCAGTTATCTAGTTTTGTTCTTAGTATCATGCAGTTGAT 
CGGGATTGTGGCTGGGGTGAGTTTTTCTTTCTTGATTTCTATCTTTAAAGAGAAACTGCT 
CCTCTGGTCTGGTATTACCTTTGGCTTGGGGCAAATCGTGATTGCCTTGTCTTCATCCTT 
GTGGGTGGTAGTAGCAGGAAGTGTTCTGGCTGGATTTGCCTATAGTGTAGTCTTGACGAC 
GGTCTTTCAACTTGTCTCTGAACGAATTCCAGCTAAACTCCTCAATCAAGCAACTTCATT 
TGCTGTATTAGGCTGTAGTTTCGGAGCCTTTACGACCCCATTCGTTCTAGGTGCAATTGG 
CTTACTAACTCACAATGGGATGTTGGTCTTTAGTATCTTAGGAGGTTGGTTGATTGTAAT 
CTCTATCTTTGTCATGTACCTACTTCAGAAGAGAGCTCTAGGATTGATTCCTAAGTTTTT 
CTTTTGATACTCAATGAAAATCAAAGAGCAAACTATAGTTGATTGAGTTTGGAATAGTAT 
GCTGTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 356 1027 F 224 aa 

> 3860582-1 ORF translation from 356-1027, direction F 
VLPLYLLFVPYGKSKKEVKKRAKEASRLTREMKGLIFTLAIEAAVWCTNTAITIRIPSL 
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MVERGLGDAQLSSFVLSIMQLIGIVAGVSFSFLISIFKEKLLLWSGITFGLGQIVIALS.S 
SLWVWAGSVLAGFAYSWLTTVFQLVSERIPAKLLNQATSFAVLGCSFGAFTTPFVLGA 
IGLLTHNGMLVF S ILGGWL I VI S I FVMYLLQKRALGL I PKFFF * 

Description : 
unknown 

Assembly ID: 3860724 
Assembly Length: 119 lbp 

> 3860724 Strep Assembly Assembly id#3860724 

GGATTCCAACGATTATGAACTTGACTGGTCCACTGATTCATCCAATGGCTTTAGAAACAC 
AGCTTTCTTGGAATTAGTCGTCCAGACTCCTAGAAAGTACAGCTCAGGTTTTGAAAATAT 
GGTCGCAAACGTGCCATCGTGGTTGCTGGACCAGAAGGGTTGGATGAAGCTGGCTTGAAC 
GGAACAACCNAGATTGCACTTNTTGAAAATGGCGAAATCAGCTTGTCAAGCTTTACTCCA 
GAGGATTTGGGAATGGAAGGCTATGCTATGGAAGATATTCGTGGTGGGAATGCTCAGGAA 
AATGCAGAAATTTTGCTTAGCGTTCTGAAAAACGAAGCAAGTCCATTCTTGGAAACGACA 
GTCTTGAATGCTGGTCTTGGTTTCTATGCTAATGGTAAGATTGATAGCATCAAGGAAGGA 
GTTGCCTTGGCCCGTCAAGTGATTGCTAGAGGCAAGGCCCTTGAAAAACTCAGACTGTTA 
CAGGAGTACCAAAAATGAGTCAGGAATTTTTAGCACGAATCTTAGAGCAGAAGGCGCGTG 
AGGTGGAGCAGATGAAGCTGGAGCAAATCCAGCCTCTGCGCCAGACCTATCGCTTGGCAG 
AATTTTTGAAGAATCATCAGGACCGCTTGCAGGTAATCGCTGAGTCAAGAAAGCTAGCCC 
TAGTTTGGGAGATATCAATCTCGATGTGGATATTGTGCAACAGGCCCAGACTTATGAAGA 
AAACGGAGCAGTGATGATTTCGGTGTTGACAGATGAGGTTTTCTTTAAAGGGCATTTGGA 
TTATCTACGGGAAATTTCCAGTCAGGTAGAGATTCCGACGCTCAACAAAGACTTTATCAT 
AGATGAAAAGCAAATCATCCGCGCTCGCAATGCAGGTGCGACAGTTATCTTGCTTATTGT 
GGC AGCCTTGTCCGAAGAACGCCTCAAGGAACTGTATGACTACGCGACAGAGCTTGGTCT 
GG AAG T C T T AGTGG AG AC T C AC AATC T AG C T G AAC T AG AGG T AG C C C AC AG AC T TG G TGG 
CTGAGATTATCGGGGTCAACAACCGCAACTTGACTACCTTTGAAGTCGACTTGCAGACCA 
GTGTAGATTTAGCCCCTTACTTTGAGGAAGGTCGCTATTACATTTCTGAATCTGCCATTT 
TCACAGGGCAGGATGCGGAACGACTAGCCCCATACTTTAACGGAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 139 498 F 120 aa 

2 686 1024 F 113 aa 

> 3860724-1 ORF translation from 139-498, direction F 

WAGPEGLDEAGLNGTTXIALXENGEISLSSFTPEDLGMEGYAMEDIRGGNAQENAEILL 

SVLKNEASPFLETTVLNAGLGFYANGKIDSIKEGVALARQVIARGKALEKLRLLQEYQK* 



Description : 
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ANTHRANI LATE PHOSPHORIBOSYLTRANSFERASE (EC 2.4.2.18). - _ 
LACTOCOCCUS LACTIS (SUB SP. LACTIS) (STREPTOCOCCUS LACTIS) . 

> 3860724-2 ORF translation from 686-1024, direction ~F 
VDIVQQAQTYEENGAVMISVLTDEVFFKGHLDYLREISSQVEIPTLNKDFIIDEKQIIRA 
RNAGATVI LL I VAAL SEERLKELYDYATELGLEVLVETHNLAELEVAHRLGG * 

Description : 

INDOLE -3 -GLYCEROL PHOSPHATE SYNTHASE (EC 4.1.1.48) ( IGPS ) . - 
LACTOCOCCUS LACTIS ( SUBSP . LACTIS) (STREPTOCOCCUS LACTIS) . 

Assembly ID: 3860858 
Assembly Length: 8 5 8bp 

> 3860858 Strep Assembly Assembly id#3860858 

ATCGAATTTGCCAACCAAGAAAAATATCCCTTGGATGGTTCTTGGCAATGCAAGCAATAT 
CATCGTTCGTGATGGTGGGATTCGTGGATTTGTCATCTTGTGTGACAAGCTCAATAACGT 
TTCTGTTGATGGCTATACCATTGAAGCAGAAGCTGGGGCTAACTTGATTGAAACAACTCG 
CATTGCCCTCCGTCATAGTTTAACTGGCTTTGAGTTTGCTTGTGGTATTCCAGGAAGCGT 
TGGCGGTGCTGTCTTTATGAATGCGGGTGCCTATGGTGGCGAGATTGCTCACATCTTGCA 
GTCTTGTAAGGTCTTGACCAAGGATGGAGAAATCGAAACCCTGTCTGCTAAAGACTTGGC 
TTTTGGTTACCGCCATTCAGCTATTCAGGAGTCTGGTGCAGTTGTCTTGTCAGTTAAATT 
TGCCCTAGCTCCAGGAACCCATCAGGTTATCAAGCAGGAAATGGACCGCTTGACGCACCT 
ACGTGAACTCAAGCAACCTTTGGAATACCCATCTTGTGGCTCGGTCTTTAAGCGTCCAGT 
CGGGCATTTTGCAGGTCAGTTCGAATTTCAGAAGCTGGCTTGAAAGGCTATCGTATCGGT 
GGCGTAGAAGTGTCAGAAAAGCATGCAGGATTTATGATCAATGTCGCAGATGGAACGGCC 
AAAGACTACGAGGACTTGATCCAATCGGTTATCGAAAAAGTCAAGGAACACTCAGGTATT 
ACGCTTGAAAGAGAAGTCCGGATCTTGGGTGAAAGCCTATCGGTAGCGAAGATGTATGCA 
GGTGGTTTTACTCCCTGCAAGAGGTAGTGGGGACCTGACAGAGCCCCGATCGGTTAATCT 
ATGAAAAAGAAGGAATTT 

ORF Predictions : 

ORF # Start End Direction Length 



1 610 807 F 66 aa 

> 3860858-1 ORF translation from 610-807, direction F 

VSEKHAGFMINVADGTAKDYEDLIQSVIEKVKEHSGITLEREVRILGESLSVAKMYAGGF 

TPCKR* 

Description : 
unknown 
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Assembly ID: 3860890 _ 
Assembly Length: 9 8 0bp 

> 3860890 Strep Assembly — Assembly id#3860890 

CTGAAAAAACAGGTTTTGACTATGNAGATTGACAGACGACCGTTCGGAGGTGCAGATATT 
GATGCAGCAGGACCTCCCTTACCTGATGAAACCCTTAAGGCAAGTAGGGAAGCAGATGCT 
ATCCTACTAGTAGCTATCGGTAGTCCTCAGTATGATGGAGTAGCGGTTCGCCCTGAACAA 
GGCCTGATGGCTCTCCGTAAGAACTCAATCTTTACGCTAATATTCGTCCTGTAAAAATCT 
TTGACAGTCTCAAGTATTTGTCACCACTCAAACCGGAACGAATTTCTGGTGTAGACTTCG 
TCGTGGTGCGTGAATTGACTAGGCGAGATTTACTTTGGAGATCATATCCTTGAAGAGCGC 
AAAGCGCGTGATATCAACGACTATAGCTATGAGGAAGTGGAGCGGATTATTCGCAAAGCC 
TTTGCCATCGAATTGCAAGAAATCGCAGAAAAATCGTTACTAGTATCGATAAGCAAAATG 
TTCTAGCGACCTCAAAACTCTGGCGGAAAGTAGCTGAGGAAGTCGCACAGGATTTCTCAG 
ATGTAACCTTGGAACACCAGCTGGTAGACTCAGCTGCTATGCTTATGATTACCAATCCTG 
CTAAGTTTGATGTTATTGTAACGGAGAATCTTTTTGGAGATATTTTATCTGATGAATCAA 
GCGTCTTATCTGGTACACTTGGGGTTATGCCATCAGCCAGTCATTCTGAAAATGGACCAA 
GTCTCTATGAACCTATTCACGGTTCAGCACCTGATATTGCAGGTCAAGGAATTGCCAATC 
CTATTTCCATGATTTTATCAGTTGTCATGATGTTGAGAGATAGTTTCGGACGTTATGAGG 
ATACAGAGCGTATCAAACGTGCTGTTGAGACAAGTCTGGCGGCAGGAATTTTAACGAGAG 
ATATAGGAGGTCAGGCTTCAACAAAGGAAATGATGGAAGCTATTATTGCAAGGTTATGAA 
GTTAGACGAAAAAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 

1 * 397 486 F 30 aa 

> 38-60890-2 ORF translation from 397-486, direction F 
VERI IRKAFAIELQEI AEKSLLVS I SKMF * 

\ 

Description : 
unknown 

Assembly ID: 3860952 
Assembly Length: 87 4bp 

> 3860952 Strep Assembly -- Assembly id#3860952 

TCGATCTAGAGAATTGCTCCAGAGCTTCCTGACCGTCCGCTGCCTCAATAGTTTCATAGC 
CACAATCCGTCAAATAATCACTGACCCCCTCACGGATCATCTCTTCATCTTCTACAATTA 
AAATTTTCATACTTTAACTGCTCTCTATTTTTTATTTTTCTTAGAATAAATACCTACTCT 
ATTTTCTATTATAGTCTCTTGCTGGCCTTTTGTATGTAAGCAACTGACCACTAGATAAAA 
CGTTGTGAAATTCCTTTCTCATAAATTCCATAACTTTAGTATATTATATTTAAGCACTAA 
AGTACAAAGAAAGCAACTGAAAGCAATGATTTTCACCACTGCTTTCAGATTTATTTTGAA 
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TTGTTAAATAGCTATTCCTATCCACTATTCTTGAATAGAAACACAAGATGCAATCTT1CAT 
TCCAGACTCATTTTTTAAAAAATCAAATTTATTCACCATCCAGCAAGAGCTCTTTTGGTT 
GTTTTCTAAGGAGATTGCTTGAAGCAAGCGCCATAACGAGAACCACTAGAACCAAGGCAA 
GGACAAAAATGATGATAAAGTCTGATGTCTGAATGGAAATGTCTAGGCTCGACAAGGTCT 
TGCTAAAGCCATCTACTTCTGCACCGCCACCAAGGTTAGAGGCTTGAGCCGCCTTACTAG 
CCTGTTTGGCAACACCTGAAGTCACATTGGCAAGGACAGTGTTTCCAATTCGCACGGGCA 
GTGTAATTAGCTAGGAAGTAAGCANAAACTAGAGCAGGGATAGCAATCAAGATAGATTCG 
GTGATGAATTGACCCAAGATACTTGCCTGCTTGAGACCAATAGAGAGGAGGATTCCCACT 
TCCTTGCCGACGGGCATTGATCCAAAGACTGAGC 

ORF Predictions: 

ORF # Start End Direction Length 



1 449 715 R 89 aa 

> 3860952-1 ORF translation from 449-715, direction R 
VRIGNTVLANVTSGVAKQASKAAQASNLGGGAEVDGFSKTLSSLDISIQTSDFIIIFVLA 
LVLWLVMALA S SNLLRKQPKELLLDGE * 

Description: 
unknown 

Assembly ID: 3860962 
Assembly Length: 7 62bp 

> 3860962 Strep Assembly Assembly id#3860962 

CTTGTAACGGTCATAAAGTTTCTGCAAACTACCAl'CCTTGCTCCATTTAGTAACCAAGTT 
ATCAAGATAGTCGTTGAGCTCTGTATTTGATTTCTTGGTAACAATACCGTAGTCAGATGG 
CTTGAAACTATCATCTAGTAGTTCTGTGCGTTTAACTAGTGTAGCCAGATAGAATAGAGC 
GGTCAACGGAAAAGGCATCGATACGATGAGCGTGAAGGGAAGTAATCAATTCTGGGTAGG 
AACCAAGTTCGACGAATTTAAACTTCAGACCTTTCTTTTTACCCAGTTCAGTAATCAGGC 
GTTGGGTGATAGAACCTTGGGCGACTCCGATGGTTTTGCCGTTTAGGTCCTCAATCTTTT 
TGATTTTGGCAGATTTATTGACCAAAAATCCAGAAGCGTCTGTGTAGTAGGGACTGGTAA 
AGTTGTAGAGTTTTTTGCGTTCGTCCGTGATGGTAAAGGTCGCGATATCCATATCGACCT 
GTTCATTGTCTAGAAGGGGGCCGCGGGTTTGTGCTGTAACCGGCACATAGTGAATCTTGA 
CCTTGAGTTCATCAGCTACCATTTTGGCCAAGTCGGTTTCGATACCAGAATAAGTACCGG 
TCTTGGGATCTTTGTTAACCAAAATTGGGAACGTCTTGTTTGACACCCGACAACCAGTTC 
GCCTCTTTTTTGAATGTCTGCGATACTAGTATTAGCCTGGACTGGTTTGGCAGCAACAAG 
GCCGAAAAGGCTAATCAATAATGCTGATAAAAAGAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 
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1 152 646 R 165 aa _ 

> 3860962-1 ORF translation from 152-646, direction R 
VSNKTFPILVNKDPKTGTYSGIETD-LAKMVADELKVKIHYVPVTAQTRGPLLDNEQVDMD 
IATFTITDERKKLYNFTSPYYTDASGFLVNKSAKIKKIEDLNGKTIGVAQGSITQRLITE 
LGKKKGLKFKFVELGSYPELITSLHAHRIDAFSVDRSILSGYTS* 

Description : 

cell adhesion factor PEB1 precursor - Campylobacter jejuni 

Assembly ID: 3861268 
Assembly Length: 1942bp 

> 3861268 Strep Assembly Assembly id#3861268 

CTCGAATTTTTGGTGCTCCAGAAACGGTTCCAGCAGGAAGCGTTGCTTTCAAGGCATCCA 
TGGCAGTGAGTTCTGCAAGCAAACGTCCCTTGACCACACTGGTCAAATGCATGACGTAGC 
GGAAGAGCTCCACCTCCATATACTTAGTAACTTGGACACTGGCCGTTTCAGAGATGCGGC 
CAATATCGTTACGCCCCAAGTCTACCAACATTCGATGTTCTGCTGTTTCCTTCTCATCAG 
AGAGGAGGTCAGTCGCCAAGGCCTTGTCTTCTCCATCCGTAGCCCCTCTTGGTCGCGTCC 
CTGCAATCGGATTGGTTGTCACGATGCCATTTTTGACAGAAACCAAACTTTCTGGACTAG 
CTCCGATGATTTGATAATCCCCAAAATCATACAAATAAAGGTAATTAGATGGATTAGTCA 
CGCGGAGATTTCTGTAGAAGTCAAATGGATTTCCAGTTAACTTCTGCGTGAAGAAAACGC 
TGGCTGAGTTACACATCGGAACATATCTCCGTTACGAATCAAGTCACGAGCTGTTTCTAC 
CATTCCCTCAAACTTATGTGGAGCGATATGCGGTTTGAAGTCAAGTGGTGATAAATCCAA 
GTCTTCAAATTCATTTGGAGCAGGAATGCGTAATTCCTCAAGCACTTGGTTCAAGGATTT 
TTCCAAGGCCTCTTGACTGCGCTCACTATAAAGTGCATCCTCTATGACATGTTATCTTCT 
CCTTCTTGTTGGTCAAAGACCATATAGCTCTCATAGACAAAGAAATGCATGTCGGGCGTC 
CCAATTGTATCCTCAGGGATTTGACCAATTTCTTCATAAAGCGAAATCATATCGTAACCA 
ACAAAACCAATGGCTCCCCCACCAAAAGGGAGGTCTGAATGGTGCTGGCTCTTATGAATC 
ACTTCATAAAGGAAATCCAAGGGATCCCGATCAATCGCTTGACCATTTTGATAGAGAACT 
CCATTTTCAAACTTAATCTCAAAAACTGGATTATAGGCTAGGATAGAAAAACGAGCTGTT 
TCCTTGTCTCTCGGAATACTCTCTAAAATAACCTTATGTTGCCCCTTTAAGCGCATATAA 
GCCAAGATTGGTGATAAGACATCTCCATGAATGATTCGTTCCATTGTCATTTCCCTTTCA 
GTTCTAATTCGAGTTCGTGGCGACTGTATGAAAAATCCCCACGCAAAATAACTTGCGTGA 
GGACGAAATTCGCGGTGCCACCTCAATTATAGGATTTCTCCTATCTCTCATTCCTGTCTC 
AGATATCTCCTGTAACAGGCTGTGCGATAAAGGGCACTCCCTTGAGAATGATGTTTTCTT 
CTCTCGTTTCAGATGAACCCAACTTTACAGCTTTCTCTGCTTGTTTTCAGCAACCACAAG 
CTCTCTGTGAGAGAAAAGACTGTAATTTTTCCATCTATTATTTTTTAGCTTCTAGTAATC 
TGCAATCGCAGCTAGGTCCTTGCCTCCACGACCAGAGACATTGATGAAGAGATGTTCATC 
TCGGTACACCTTTATACTCTTCGAAAATCTCTTCAAACCGCGTCAACGTCGCCTTGCCGT 
AGGTATGGTTACTGACTTCGTCAGTTCTATCTGCAACCTCAAAACAGTGTTTTGAGCTGA 
CTTCGTCAGTCTTATCGACAACCTCAAAACAGTGTTTTGAGCAGCCTGCAGCTAGTTTCC 
TAGTTTGCTCTTTGATTTTCATTGAGTATTATTTCATTTTCTCCTGCAATTGAATTCTTG 
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CTCAGCTTTTTGTCTTCTATTTCTTTAAAATCAAAGTAGCTCTTTTGTTAATAACTCGAT 
CAACAAACATCGTGGTACAAGTATCTACTTTGAAATTTATCAACCACTTAACAACTGATA 
CTGTATTTCTAGGAAAACGATGACATTCTTCCTAATAAAACTTCTCATATATAGCATAAA 
TTTCTACTCTTTTTAATTCGAT 

ORF Predictions : 

ORF # Start End Direction Length 



1 457 645 R 63 aa 

> 3861268-1 ORF translation from 457-645, direction R 
VLEELRIPAPNEFEDLDLSPLDFKPHIAPHKFEGMVETARDLIRNGDMFRCVTQPAFSSR 
RS* 

Description : 

ANTH RAN I L AT E SYNTHASE COMPONENT I (EC 4.1.3.27). - LACTOCOCCUS 
LACTIS (SUBSP. L ACTIS) (STREPTOCOCCUS LACTIS). 

Assembly ID: 3861270 
Assembly Length: 1048bp 

> 3861270 Strep Assembly -- Assembly id#3861270 

CTGTTAAGATTGTTTCCGTGCATCCACATAGGATTTACCTTGTCTGTATGGGCCAATTCA 
CCCATCAAAACGCCATAGGTCTCATCTGTCAAGATACTAGACATACCGATATTGTACCAA 
AGACTGGTATGACGGAAATAAGTCGATGCGTGTAAACTCAACAAAAAGAGACGCAAGTTG 
ATTAGAAAAACCGTCATAGCAATAGCTGCCACAGGAGCTTGAACCACAATCAGTGCCAAC 
ATGGCAAACTGGGCACTCCCAGCATAAACAAAGAGACTCATCAAGCCCATCTCAACAGGT 
GTCACATAGGGCGCACCGATAGTCCCACAGGCCAGGCCGATACTGACATAGCCAAGAGCC 
GTTGGCATGGCTGCCTGCGCCCCCTCCTAAAATCCTTTTTCTTTCATCTTTCTCCTCATA 
TTGTCTTAATAATACTCAATGAAAATCAAAGAGCAAACTAGGAAATTAGCCGCAGGNTGC 
TCAAAACACCGTTTTGAGGTTGCAGATAGAAACTGACGAAGTCAGCTCAAAACACCGTTT 
TGAGGTTGCAGATAGAACTGACGAAGTCAGTAACATATATACGGCAAGGCGACGTTGACG 
TGGTTTGAAGAGATTTTCGAAGAGTATTAGAAAATGCCGATAAGGGTCTGCATACCAAGG 
CTGGTGAGGATGATGGCAATCCAGCAGACGGCTCCGAGAACAATGGATTTTCCACTGGAT 
TTGACCATAGCGACCAGATTAGTTTTGAGACCGATGGCACTCATGGCCATGATAATGAGG 
AATTTAGAGAGTTGTTTGAGAGGGGTAAAGAAACTACTAGACACACCGAGAGAGGTCAGA 
AGGGTGGTTAGGAGCGATGCAAGGATGAAGTAAAGGATAAAAAGTGGGAAGACTTTTTTC 
AGTTGTAAGCCTTGCTTATTTTTTTGCTCGCGACTTTGCCAGTAGGAGAGAAAGAGAGTG 
ATGGGGATGATAGCTAGGGTGCGCGTGAGTTTGACAATGGTTGCGGATTCGAGGGTATTG 
GTC TGGT AG AG AC TGTCC C AAGCGC T AG 

ORF Predictions: 

ORF # Start End Direction Length 
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1 627 824 R 66 aa 

> 3861270-1 ORF translation from 627-824, direction R 
VSSSFFTPLKQLSKFLIIMAMSAIGLKTNLVAMVKSSGKSIVLGAVCWIAIILTSLGMQT 
LIGIF* 

Description : 
unknown 

Assembly ID: 3861288 
Assembly Length: 1571bp 

> 3861288 Strep Assembly -- Assembly id#3861288 

AGAGCTGGTAATATTCCCAAAGAAACGGCTCAAATCGAATTAGAAAGCCTTCTGCAAAAA 
GGAATCCCAGTCGCTCTGGTATCACGATGCTTTAACGGTATTGCCGAGCCTGTTTATGCC 
TACCAGGGTGGGGGCGTACAGTTGCAAAAAGCAGGCGTTTTCTTTGTTAAAGAACTCAAC 
GCCCAAAAAGCCCGCTTGAAACTCCTCATCGCCCTCAATGCCGGACTAACAGGACAGGCT 
TTGAAAGACTATATGGAAGGCTAATACTCTTCGAAAATCTCTGCAAACCACGTCAGCGTC 
GCCTTACCGTATGTAGAGCACAAAATCAGGAAATCTTCTCGATTCCCTGATTTTTTCTAT 
TTACGTTTTCGTGTTGAGCTACGTTCTGTCAAACCATGAGGTAAGAGAACTTCACGTTCT 
TCCAACTCTTCCTTATGCATAATCTTGGTCAACATACGCATACTAATGGCACCAAGGTCA 
TAAAGAGGTTGGGCAATCGTTGTCAAGTTTGGACGGGTAAAGCGTGAGATTTGTGAATCA 
TCACTAGTAATAATTCGATAATCTTCTGGCACAGAAACACCTTATCAGCCAAACCGTTCA 
AGACTCCTGCTGCCAACTCATCACCTGTCACAACTGCTGCAGTTGCATTTGATGAAATCA 
AACGCTCTGCTAAGGCGTAACCATCATCATAGCTATATTTAGATTCAAATACCAAACCCT 
CACTATAAGCGATTCCTGCTTTTTTCAAGGTTTCCTTGTAGCCAACTAAACGAACCTTAC 
CATTGATGTCATCCACTAGCGGACCGCTAACGAAAGCAATACGCTCATTTTCTTTAGCAA 
GGTAACTCACTGCATCAATTGTTGCTTGCTTATAGTCAATATTGACACTTGGCAACTGGT 
GCTCAACATCGACAGTTCCTGCGAGAACAATCGGAGTACGTGAACGCGAAAATTCTGAGC 
GAATTTTATCTGTCAAGTGATAACCCATATAGATAATGCCATCTACCTGCTTTGAAAAGA 
GGGTATTGACAACAGAAACTTCTTTCTCGTTATCTTCATCGCTATTAGCTAGGACAATAT 
TGTACTTGTACATTTCTGCAATATCATCAATCCCCTTAGCCAAACTCGAAAAATAACCAT 
TGGTAATATTTGGAATCACGACACCGACAGTGGTTGTCTTTTTACTTGCAAGACCACGCG 
CAACTGCATTTGGACGATAATCCAAACGATCAATTACCTCTAGCACTTTTTTACGGGTAT 
TCTCTTTTACATTTTTATTGCCATTGACCACACGGCTGACCGTCGCCATGGGAAACACCT 
GCTTCACGAGCGACATCATAAATGGTTACTGTATCATCTGCATTCATTCCTTTTCCTGTC 
CTTTCTATCTCCACACATTCTTTTACAAGTAGAAGTGCTGAATTGAAAGCTCTATATCTT 
ACTTACAAAAATGAAGATGTGAAAATTTCGTTTTCATATTTCTACTTATTCCATTCTATC 
ACTAATTGTAAACACTTTCAAGTGTTTTTTGAAGATTGATTGAAAAAATTTCATAGAAAA 
CCTAGGTTTAG 

ORF Predictions: 
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ORF # Start End Direction Length 



1 357 572 R 72 aa 

> 3861288-1 ORF translation from 357-572, direction R 
VPEDYRIITSDDSQISRFTRPNLTTIAQPLYDLGAISMRMLTKIMHKEELEEREVLLPHG 
LTERSSTRKRK* 

Description : 

GLUCOSE-RESISTANCE AMYLASE REGULATOR. - BACILLUS SUBTILIS . 

Assembly ID: 3861306 
Assembly Length: 1682bp 

> 3861306 Strep Assembly -- Assembly id#3861306 

CTGACGTAAAAAAGATTTTCGGAAAAGTATCATCATCTATTTTAGACCATTTTCTTATAA 
TAACCATTTTATTTTTATTTGTCAAGGTCTTTGAATTCTTTCTTAAACAAGCCTTGTAAT 
CTCTACTTTTGAAGAATTTATTTTTCCTTACTGACAAGATTTGAGACGGTAGGAATCATT 
GAAAATAACCTAGCCAACATCAATCACAATCATTTCTCCTTTCTCAATTACACTAAATTA 
TAGTGTATTGAATCTATAACAGTGCACCTTGGCTGCTAAAATATTTCTATAAATTAATTT 
GACTTTCCTGATAGAGTTGTTCACATCTTATTTCAATTCACTATACTTTCCCTTATACTC 
AATGAAAATCAAAGCGCAAACTAGGAAGCTAGCCACAGGCTGCTCAAAGCACTGCTTTGA 
GGTTGTAGATAAGACTGACGAAGTCAGTTACATATATCTACGGCAAGGCGAAGCTGACGC 
GGTTTGAAGAGATTTTCGAAGAGTATAAAGTTTGTTTCTGTATCTTTCAGAAAAATAAGG 
TATACTGTATGTAAACGATTTCAAAGGAGTCCAGTTATGGCAAAAACATTTTTTATTCCA 
AATAAACAGAGCATTTTAGGAGAACAAGAGATTTTGAATGCCAAGTCGATCTTGGCTATG 
ATGTAGTCTATCTCCGTCAGCCTCTTAATCGTCTCGAGTATATTGAGTGTGCGATAGTGG 
GGCAATCACAATTTCTTTTTAAGGTCAGTTATGCTGATGGTCAAAAGGCTTACCGTGTCG 
ATCTTCCTGACCTACTAACAAAGACAGACTGGCAGATTATCAAGTCATTTTTAGATGTTT 
TGCTTGCTTATACAGGGACTGATATTGAAGGGCTAGATGGTTTTGATTTTGAAGCTTATT 
TCCAAGCAAGTATTCAAGCCTATCTAGCAGACCCTGTAGCTCGTTTTACGATTTGCCAAC 
GAATTTTTAATCCTATTTTCTTTAGTCGTGAGAACTTGAAAAGCTTTTTAGAGGCAGATG 
GCTTGGCTCAGTTTGAAGCGCGTGTGCGTGCGGTTCAAGAGACAGATGCCTACTTTGCGA 
GAGTTTCCTTCTATCAGGATGGAGAAGGAAAAGTGCATGGCGTTTACCATCTAGCTCAAG 
GAGTCAAGACAGTTTTACCGAGAGAACCGTTTGTTCCTGCAGCCTATATTGAGCGAATTG 
GTGGATAAGGAAGTCCAGTGGGAGATTGACTTGGTTCAAATCACAGGAGACGGCTCTAAA 
CCAGAAGACTATGAATCCATAGCTCGCTTGGACTATGCAAAATTCTTAGAGGTATTACCC 
CCATCTTTTTACCACCAACTAGACGCCAATCAAATAGAAATACAACCCATCCTAGGACAA 
GATTTTAAAACATTAGCACAAGAAAAGTAAAGCAGAAGCAGGTCAATCGACTTGCTTTTT 
TGACATAGAAAAAATCCTGCCAAGGATGACAGGATTGCTACTCAATGAAAATCAAAGAGC 
AAACTAGGAAGCTAGCCGCAGGCTGTACTTGAGTACGGTAAGGCGAAGCTGACGTGGTTT 
GAATTTGATTTTCGAAGAGTATGAATTTTAAAGAAAGGCCAAGATACGAAGATAATCTCC 
AATCAGTGCCACTTCAGCTTCCAAGAAGAAGAAGATTATAACTCCCGTTCCCCAAGGACA 
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GA 

ORF Predictions: 

ORF # Start End ... Direction Length 



1 717 1208 F 164 aa 

2 1201 1410 F 70 aa 

> 3861306-1 ORF translation from 717-1208, direction F 
VGQSQFLFKVSYADGQKAYRVDLPDLLTKTDWQIIKSFLDVLLAYTGTDIEGLDGFDFEA 
YFQASIQAYLADPVARFTICQRIFNPIFFSRENLKSFLEADGLAQFEARVRAVQETDAYF 
ARVSFYQDGEGKVHGVYHLAQGVKTVLPREPFVPAAYIERIGG* 

Description : 
unknown 

> 3861306-2 ORF translation from 1201-1410, direction F 
VDKEVQWEIDLVQITGDGSKPEDYESIARLDYAKFLEVLPPSFYHQLDANQIEIQPILGQ 
DFKTLAQEK* 

Description : 
unknown 

Assembly ID: 3861334 
Assembly Length: 3 0 41bp 

> 3861334 Strep Assembly -- Assembly id#3861334 

ATCGAATTAAAAATGAGGTATTCAGGCTTGTGATTTTCTATGGAAGTTAATAGTGATTGC 
CTCTAATGCTTACAAGTGATATTAAAAATAGAGGACCTAGTGATGTCAATCATTTCAACT 
GATTTAACCCCTTTTCAAATAGATGATACATTGAAAGCAGCCTTGCGAGAAGATGTTCAT 
TCCGAAGATTACAGTACCAATGCCATTTTTGATCATCATGGCCAAGCCAAGGTGTCGCTT 
TTTGCCAAGGAAGCTGGTGTTTTAGCGGGGCTAACCGTTTTTCAAAGGGTTTTTACCCTA 
TTTGATGCCGAGGTGACCTTCCAGAATCCTCATCAATTTAAGGATGGGGATCGTTTGACT 
AGTGGCGATTTGGTTTTAGAAATCATAGGCTCGGTGAGAAGTCTCTTAACATGTGAACGC 
GTTGCCTTGAATTTTTTACAACATTTATCAGGGATCGCTTCGATGACAGCTGCTTATGTA 
GAAGCCTTAGGCGATGATTGCATTAAGGTATTTGATACTCGAAAAACTACTCCTAATTTA 
CGTCTTTTTGAGAAATATGCCGTGAGAGTTGGCGGTGGCTATAATCATCGCTTTAATTTA 
TCAGATGCTATCCTGCTAAAAGACAATCACATTGCGGCAGTAGGTAGTGTTCAAAGGGCA 
ATTGCTCAAGCGCGTGCCTATGCTCCTTTTGTGAAAATGGTCGAGGTGGAAGTGGAAAGC 
CTTGCTGCTGCCGAAGAAGCTGCGGCGGCGGGTGCTGATATTATCATGTTGGATAATATG 
TCATTGGAACAGATTGAACAGGCCATTACCCTAATTGCAGGACGTTCTCGGATTGAATGT 
TCTGGAAATATTGATATGACCACTATTAGCCGTTTTCGTGGTTTAGCGATTGATTACGTC 
TCCAGTGGTAGTTTAACCCATAGTGCTAAGAGTCTTGATTTTTCCATGAAGGGTTTAACC 
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TACCTTGATGTCTAAGTTGTAAAATAAACTAACTTTTTAAAGGATGTCTTTCCTCTAiSAA 
CGAGTTTTATGTCAGATAGTTTAAACGCCTCTTCAAATATAGTAAAATGAACCAAAAATA 
GTACACAATGTGGTATAATCTTCTTATGGCATATTCAATAGATTTTCGTAAAAAAGTTCT 
TTCTTATTGTGAGCGAACAGGTAGTATAACAGAAGCATCACACGTTTTCCAAATCTCACG 
TAATACCATTTATGGCTGGTTAAAGCTAAAAGAGAAAACAGGAGAGCTAAACCACCAAGT 
AAAAGGAACAAAACCAAGAAAAGTTGATAGAGATAGACTTAAAAACTATCTTACTGACAA 
TCCAGACGCTTATTTGACTGAAATAGCTTCTGAATTTGGCTGTCATCCAACTACCATCCA 
C TAT G C G C T C AAAG C T ATGGG C T AC AC T C G AAAAAAGG AC C AC AC C T AC T ATG AAC AAG A 
C C C AG AAAAAG TAG C C T T AT TTC T T AAAAAT T T T AAT AG T TT AAAG C AC C T AGC AC C TGT 
TTAGATTGATGAAACAGGATTCGATACTTATTTTTATCGAGAATATGGTCGCTCATTAAA 
AGGTCAGTTAATAAGAGGTAAAGTATCTGGAAGAAGATATCAGAGGATTTCTTTGGTTGC 
AGGTCTAACAAATGGTGAGTTAATCGCTCCAATGACTTACGAAGAGACGATGACGAGCGA 
CTTTTTTGAAGCATGGTTTCAGAAGTTTCTCTTACCAACATTAACCACACCATCGGTTAT 
TATTATGGATAATGCAAGATTCCATAGAATGGGTAAGTTAGAACTTTTATGCGAGGAGTT 
TGGGCATAAACTTTTACCTCTTCCTCCCTACTCGCCTGAGTACAATCTTATTGAGAAAAC 
ATGGGCTCATATCAAAAAGCACCTCAAAAAGGTATTACCAAGTTGCAATACCTTTTATGA 
GGCTCTTTTGTCCTGCTCTTGTTTCAATTGACTATAGTTCACGGATACAGTTGGGAAAGA 
AGTTAAATGTAGTTGGATTTCCACTAAAGGTTGATGAGTAAGTTTTTGTATCTGAACCTG 
ATTGGCCGCAAGCAGCTAAAAGCAAAGCAGATGCAAAAGTCAGACCTGCACCAAGGACAC 
GCTTCTTTATGTTCATCTTCTTTCTCCTTAATAGTGGGAATTTGTAAAGTTAATTGAATT 
TCAAGAATGAAGGTTTTATAAACTTTGGTTATAAAAAACAAAGGATTTCTGTCTTTTATA 
CAGTCCTCCCCTTGTTTTTATACGATTTCAATTTTAAATTTTTCTGCAAAAAATATTTAT 
AGTAATTCCACACAGAAAGCATCCCATGGAACTAAGATTTGTTTTTCAAAGACTTCTTGA 
GCTAGGGTGTTTTCAATCAAGACAGATTTGACTTTTCCTTCTACTGTCAAGTCTTGCTCT 
TCATTGGACAAGTTAGCCACAACTAGGAAGCGACGGTCGCCATCCTTACGTATATAAGCA 
AAGACCTTATCAGCCGTATCAAGCAATTCAAAGTCAGCTCGAATTAGCCAACTATTCTCC 
TTGCGAATTTGGACCAGTTTCTGATAGGTATAGAAAATAGAATCTGGATTTGCCAGCGCT 
TCTTGGACGTTGATCATCTCGTAATTTGGATTAACTGCCAACCAAGGTTGACCTGTTGAG 
AAACCAGCGTTTTTGCTCTCGTCCCATTGCATAGGGGTACGGGCATTGTCACGTCCAATA 
ACACGGATACTGTCCATGATTTCTTGCATCGGAACACCTTTTTCAAGAGCCTCACGCGCA 
TAGTTGAGAGATTCAATATCTTCTACTTGATCCAGTGTTTCAAACGGATAGTTGGTCATC 
CCAATCTCCTCACCTTGGTAGATATAAGGAGTTCCTCTCATAAGATGAAGCAAGATTGCA 
AAGGCTTTGGCAGATTTTTCGCGGTATTCTTGGTCATTTCCCCAGATTGAGACAATACGA 
GGGAGGTCATGGTTGTTCCAGAAGAGGGAATTCCAGCCGTCCTCAACTCCTAACTCTGTC 
TGCCATTTGTTGAAGATTTCTTTTAACTTAGCGATATTCAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 76 975 F 300 aa 

> 3861334-1 ORF translation from 76-975, direction F 
VILKIEDLVMSIISTDLTPFQIDDTLKAALREDVHSEDYSTNAIFDHHGQAKVSLFAKEA 

71 



WO 98/19689 



PCT/US97/19226 



GVLAGLTVFQRVFTLFDAEVTFQNPHQFKDGDRLTSGDLVLEIIGSVRSLLTCERVALNF 
LQHL SG I ASMTAAYVE ALGDDC I KVFDTRKTTPNLRLF EKYAVRVGGG YNHRFNL SDAI L 
LKDNHIAAVGSVQRAIAQARAYAPFVKMVEVEVESLAAAEEAAAAGADIIMLDNMSLEQI 
EQAITLIAGRSRIECSGNIDMTTISRFRGLAIDYVSSGSLTHSAKSLDFSMKGLTYLDV* 

Description : 

PROBABLE NICOTINATE -NUCLEOTIDE PYROPHOSPHORYLASE ( CARBOXYLATING ) 
(EC 2 . 4 . 2 . 19 ) (QUINOLINATE PHOSPHORIBOSYLTRANSFERASE 
(DECARBOXYLATING) ) (QAPRTASE) (FRAGMENT) . - BACILLUS 
SUBTILIS (BLAST) 

Assembly ID: 3864148 
Assembly Length: 4 694bp 

> 3864148 Strep Assembly -- Assembly id#3864148 

TTAATTTAAATTCTTAAAATTTTTTCATAATAATCTCCCTATAAAAATAAAGTCGCCCAA 
TCAGGCGGCTTATTTTTTTGAAAAATGGGCTTGGTGCCTGAGAATAAATAGCTTAGTGAT 
AGAAGAAAATGGGGAAATATGGTATAATGAAACGATAGATTTTTGAATAGGAATAAGATC 
ATGTTTGGATTTTTTAAGAAAGATAAAGGCTGTGGAAGTAGAGGTTCCGACACAGGTTCC 
TGCTCATATCGGCATCATCATGGATGGCAATGGCCGTTGGGCTAAAAAACGTATGCAACC 
GCGAGTTTTTGGACATAAGGCGGGCATGGAAGCATTGCAAACCGTGACCAAGGCAGCCAA 
CAAACTGGGCGTCAAGGTTATTACGGTCTATGCTTTTTCTACGGAAAACTGGACCCGTCC 
AGATCAGGAAGTCAAGTTTATCATGAACTTGCCAGTAGAGTTTTATGATAATTATGTCCC 
GGAACTACATGCGAATAATGTTAAGATTCAAATGATTGGGGAGACAGACCGCCTGCCTAA 
GCAAACCTTCGAAGCTTTAACCAAGGCTGAGGAATTGACTAAGAACAACACAGGATTGAT 
TCTTAATTTTGCTCTTAACTATGGTGGACGTGCTGAGATTACACAGGCGCTTAAGTTGAT 
TTCCCAGGATGTTTTAGATGCCAAAATCAACCCAGGTGACATCACAGAGGAATTGATTGG 
TAACTATCTCTTTACCCAGCATTTGCCTAAGGACTTACGAGACCCAGACTTGATTATCCG 
TACTAGTGGAGAATTGCGTTTGAGCAATTTCCTTCCATGGCAGGGAGCCTATAGTGAGCT 
TTATTTTACGGACACCTTATGGCCTGATTTTGACGAAGCGGCCTTGCAGGAAGCTATTCT 
TGCCTATAATCGTCGCCATCGCCGATTTGGAGGAGTTTAGGAGGAAATATGACCCAGGAT 
TTACAGAAAAGAACCTTGTTATGCAGGGATTGCCCTGACTATTTTCCTACCAATTTTAAT 
GATTGGGGGCTCTTGCTTCAGATAGCAATCGGAATCATANCCATGCTAGCCATGCATGAA 
CTTTTGAAGATGAGAGGTCTAGAGACCATGACGATGGAGGCCTCTTGACCCTCTTTGCAC 
NTTNGTATTGACCATTCCCCTGGAATCGAATTACCTGACTTTTTTGCCAGTTGATGGGAA 
TGTGGTTGCCTATAGTGTTTTGATTTCAATCATGTTAGGAACGACCGTTTTTAGCAAGTC 
TTATACGATTGAGGATGCGGTTTTCCCTCTTGCTATGAGCTTCTACGTGGGCTTTGGATT 
TAATGCTTTACTAGATGCTCGTGTTGCAGGTTTGGACAAGGCTCTCTTAGCCTTGTGTAT 
CGTCTGGGCGACAGACAGTGGTGCCTATCTTGTTGGGATGAACTATGGGAAACGAAAGTT 
AGCACCAAGGGTATCGCCTAATAAAACCCTTGAGGGTGCCTTGGGTGGTATTTTAGGAGC 
AATTTTAGTAACCATTATCTTTATGATAGTTGACAGTACAGTTGCTCTTCCATATGGAAT 
TTACAAGATGTCAGTCTTTGCTATTTTCTTTAGCATTGCTGGACAATTTGGTGATTTACT 
AGAAAGTTCGATCAAACGTCATTTTGGTGTTAAGGATTCTGGGAAATTTATCCCTGGACA 
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TGGTGGTGTTTTGGATCGTTTCGATAGTATGTTGCTTGTATTTCCAATCATGCACTTATT 
TGGACTCTTTTAATCAAAAGACGGAGGAAACGCTATGCTCGGAATTTTAACCTTTATTCT 
GGTTTTTGGGATTATTGTAGTGGTGCACGAGTTCGGGCACTTCTACTTTGCCAAGAAATC 
AGGGATTTTAGTACGTGAATTTGCCATCGGTATGGGACCTAAAATCTTTGCTCACATTGG 
CAAGGATGGAACGGCCTATACCATTCGAATCTTGCCTCTGGGTGGCTATGTCCGCATGGC 
CGGTTGGGGTGATGATACAACTGAAATCAAGACAGGAACGCCTGTTAGTTTGACACTTGC 
TGATGATGGTAAGGTTAAACGCATCAATCTCTCAGGTAAAAAATTGGATCAAACAGCCCT 
CCCTATGCAGGTGACCCAGTTTGATTTTGAAGACAAGCTCTTTATCAAAGGATTGGTTCT 
GGAAGAAGAAAAAACATTTGCAGTGGATCACGATGCAACGGTTGTGGAAGCAGATGGTAC 
TGAGGTTCGGATTGCACCTTTAGATGTTCAATATCAAAATGCGACTTTATCTGGGGCAAA 
CTGATTACCAATTTTGCAGGTCCTATGAACAATTTTATCTTAGGTGTTGTTGTTTTTTGG 
GTTTTAATCTTTATGCAGGGTGGTGTCAGAGATGTTGATACCAATCAGTTCCATATCATG 
CCCCAAGGTGCCTTGGCCAAGGTAGGAGTACCAGAAACGGCACAAATTACCAAGATTGGC 
TCACATGAGGTTAGCAACTGGGAAAGCTTGATCCAAGCTGTGGAAACAGAAACCAAAGAT 
AAGACGGCACCGACTTTGGATGTGACTATTTCTGAAAAGGGGAGTGACAAACAAGTCACT 
GTTACACCCGAAGATAGTCAAGGTCGTTACCTTCTAGGTGTTCAACCGGGGGTTAAGTCA 
GATTTTCTATCCATGTTTGTAGGTGGTTTTACAACTGCTGCTGACTCAGCTCTCCGAATT 
CTCTCAGCTCTGAAAAATCTGATTTTCCAACCGGATTTGAACAAGTTGGGTGGACCTGTT 
GCTATCTTTAAGGCAAGTAGTGATGCTGCTAAAAATGGAATTGAGAATATTCTTGTACTT 
CTTGGCAATGATTTCCATCAATATTGGGATTTTTAATCTTATTCCGATTCCAGCCTTGGA 
TGGTGGTAAGATTGTGCTCAATATCCTAGAAGCCATCCGCCGCAAACCATTGAAACAAGA 
AATTGAAACCTATGTCACCTTGGCCGGAGTGGTCATCATGGTTGTCTTGATGATTGCTGT 
GACTTGGAATGACATTATGCGACTCTTTTTTAGATAATCGAGGAATATTATGAAACAAAG 
TAAAATGCCTATCCCAACGCTTCGCGAAATGCCAAGCGATGCTCAAGTTATCAGCCATGC 
TCTTATGTTGCGTGCTGGTTATGTTCGCCAAGTTTCAGCAGGTGTTTATTCTTATCTACC 
ACTTGCCAACCGTGTGATTGAAAAAGCTAAAAACATCATGCGCCAAGAATTCGAAAAGAT 
TGGTGCTGTTGAGATGTTGGCTCCAGCCCTTCTTAGTGCAGAATTGTGGCGTGAATCAGG 
TCGTTACGAAACCTATGGTGAAGACCTTTACAAACTGAAAAACCGTGAAAAATCAGACTT 
TATCTTAGGTCCAACTCACGAAGAAACCTTTACAGCTATTGTCCGTGATTCTGTTAAATC 
TTACAAGCAATTGCCACTCAACCTTTATCAAATTCAGCCCAAGTATCGTGATGAAAAACG 
CCCACGTAATGGACTTCTTCGTACACGTGAGTTTATCATGAAGGATGCTTATAGTTTCCA 
CGCTAACTATGATAGTTTGGATAGTGTTTATGATGAGTACAAAGCAGCCTATGAGCGTAT 
TTTCACTCGTAGTGGTTTAGACTTCAAGGCTATTATTGGTGACGGTGGAGCCATGGGTGG 
TAAGGATAGCCAAGAATTTATGGCCATTACATCTGCTCGTACAGACCTTGACCGCTGGGT 
TGTCTTGGACAAGTCAGTTGCCTCATTTGACGAAATTCCTGCAGAAGTGCAAGAAGAAAT 
CAAGGCAGAATTGCTCAAATGGATAGTCTCTGGTGAAGATACCATTGCTTACTCAAGTGA 
GTCTAGCTATGCAGCTAACTTAGAAATGGCAACAAACGAGTACAAACCAAGCAACCGTGT 
TGTCGCTGAAGAAGAAGTTACTCGTGTTGAAACGCCAGATGTTAAATCAATTGATGAAGT 
TGCAGCCTTCCTCAATGTTCCAGAAGAACAAACGATTAAAACCCTCTTCTACATTGCAGA 
TGGTGAGCTTGTTGCAGCCCTTCTAGTTGGAAATGACCAACTCAACGAAGTCAAGTTGAA 
AAATCACTTGGGAGCAAATTTCTTTGACGTTGCTAGCGAAGAAGAAGTGGCGAATGTTGT 
TCAAGCAGGATTTGGTTCACTTGGACCAGTTGGTTTGCCAGAGAATATTAAAATTATTGC 
AGATCGTAAGGTGCAAGATGTTCGCAATGCAGTTGTCGGTGCTAACGAAGATGGCTACCA 
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CTTGACTGGTGTGAACCCAGGCCGTGATTTTACTGCAGAATATGTGGATATCCGTGAA£T 
TCGTGAGGGTGAAATTTCCCCAGATGGACAAGGTGTCCTTAACTTTGCGCGTGGTATTGA 
GATCGGTCATATTTTCAAACTCGGAACTCGCTATTCAGCAAGCATGGGAGCAGATGTCTT 
GGATGAAAATGGTCGTGCTGTGCCAATCATCATGGGATGTTACGGTATCGGTGTCAGCCG 
TCTTCTTTCAGCAGTGATGGAGCAACACGCTCGCCTCTTTGTTAACAAAACGCCAAAAGG 
TGAATACCGTTACGCTTGGGGAATCAATTTCCCTAAAGAATTGGCACCATTTGATGTGCA 
TTTGATTACTGTTAATGTCAAGGATGAAGAAGCGCAAGCCTTGACAGAAAAACTTGAAGC 
AAGCTTGATGGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 212 940 F 243 aa 

2 1202 1753 F 184 aa 
3 2750 3037 F 96 aa 

> 3864148-1 ORF translation from 212-940, direction F 

VE VE VPTQ VP AH I G I IMDGNGRWAKKRMQ PRVFGHK AGME ALQTVTKAANKLG VKVI TVY 
AFSTENWTRPDQEVKFIMN^ 

ELTKNNTG L I LNF ALNYGGR AE I TQALKL I S QDVLDAK INPGDITEELI GNYLFTQHL PK 
DLRDPDLIIRTSGELRLSNFLPWQGAYSELYFTDTLWPDFDEAALQEAILAYNRRHRRFG 
GV* 

Description : 
unknown 

> 3864148-2 ORF translation from 1202-1753, direction F 
WAY S VL I S IMLGTTVF SKS YT I EDAVF PLAMS F YVGFGFNALLDARVAGLDKALL ALC I 
VWATDSGAYLVGMNYGKRKL APRVS PNKTLEGALGG I LGAILVT 1 1 FMI VDSTVALPYG I 
YKMSVFAIFFSIAGQFGDLLESSIKRHFGVKDSGKFIPGHGGVLDRFDSMLLVFPIMHLF 
GLF* 

Description : 

CDP-diglyceride synthetase (cdsA) homolog - Haemophilus 
influenzae (strain Rd K W20) 

> 3864148-10 ORF translation from 2750-3037, direction F 
VDLLLSLRQVVMLLKiyiELRIFLYFLAMISINIGIFNLIPIPALDGGKIVLNILEAIRRKP 
LKQEIETYVTLAGW1MWLMIAVTWNDIMRLFFR* 

Description : 
unknown 
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Assembly ID: 3864172 _ 
Assembly Length: 13 52bp 

> 3864172 Strep Assembly -- Assembly id#3864172 

CTCGTAAGTTCGGAAGCTATCTACACAAGAAATTAACCGCTGCCTAAAGGAGAAGCCATG 
TCAACATATAACTGGGATGAGAAGCATATCCTTACCTTTCCTGAAGAAAAAGTAGCCCTT 
TCTACTAAGGATGTCCATGTTTACTATGGTAAAAATGAATCCATTAAGGGGATTGATATG 
CAATTTGAAAGAAATAAAATTACAGCTTTGATTGGTCCGTCGGGATCGGGGAAATCTACC 
TACTTACGCAGTCTCAATCGCATGAATGATACCATTGATATTGCTAAAGTAACTGGGCAG 
ATTCTCTATCGTGGAATTGATGTCAACCGTCCAGAAATCAACGTTTATGAAATGCGTAAA 
CACATTGGAATGGTTTTTCAACGCCCCAATCCATTTGCTAAATCGAATTTACCGTAATAT 
TACCTTTGCGCATGAACGTGCTGGAGTTAAGGATAAGCAAGTCCTAGATGAAATCGTAGA 
AACCTCCCTTAGTCAGGCTGCCCTTTGGGATCAGGTTAAAGACGATCTCCACAAGTCAGC 
CTTGACCTTATCAGGTGGTCAGCAACAACGTCTCTGTATCGCTCGTGCCATCTCTGTTAA 
GCCAGATATCCTCTTAATGGATGAGCCAGCCTCAGCCTTGGATCCGATTGCGACCATGCA 
ACTAGAAGAGACCATGTTTGAGCTCAAGAAAAACTTTACCATCATCATTGTAACGCATAA 
TATGCAGCAGGCTGCTCGTGCAAGTGACTATACAGGCTTCTTTTACTTGGGTGATTTGAT 
TGAGTATGACAAGACTGCAACTATTTTCCAAAATGCCAAGCTACAGTCCACCAATGACTA 
TGTATCTGGTCACTTTGGTTAGAAAGGAAACCGTATGACAGATGCGATTTTACAGGTATC 
AGACCTGTCCGTTTATTATAATAAAAAGAAGGCTTTGAATAGTGTTTCCCTATCTTTCCA 
ACCTAAGGAAATTACAGCCTTGATTGGTCCATCTGGATCAGGGAAGTCAACCCTCCTCAA 
GTCTCTCAACCGCATGGGAGATCTCAATCCAGAGGTGACCACAACTGGATCCGTGGTGTA 
CAATGGTCACAACATCTACAGTCCGCGTACAGATACGGTTGAATTACGTAAGGAAATCGG 
AATGGTTTTCCAACAACCTAATCCTTTCCCTATGACTATCTATGAGAATGTTGTCTACGG 
GCTTCGTATCAATGGAATTAAGGATAAGCAGGTTCTGGATGAAGCCGTAGAAAAAGCCTT 
GCAAGGTGCCTCTATCTGGGATGAGGTCAAGGATCGTCTATATGATTCAGCTATTGGATT 
GTCAGGTGGTCAACAGCAGCGTGTCTGCGTGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 311 862 F 184 aa 

> 3864172-2 ORF translation from 311-862, direction F 

VELMSTVQKSTFMKCVNTLEWFFNAPIHLLNRIYRNITFAHERAGVKDKQVLDEIVETSL 

SQAALWDQVKDDLHKSALTLSGGQQQRLCIARAISVKPDILLMDEPASALDPIATMQLEE 

TMFELKKNFTIIIVTHNMQQAARASDYTGFFYLGDLIEYDKTATIFQNAKLQSTNDYVSG 

HFG* 

Description : 

HYPOTHETICAL ABC TRANSPORTER (ORF75). - BACILLUS SUBTILIS. 
(BLAST) 
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Assembly ID: 3 864180 _ 
Assembly Length: 2258bp 

> 3864180 Strep Assembly -- Assembly id#3864180 

AACTTCGACCGTGATAAACAAGCTGAGCTTTGACATACTTGTAGCCAACCTAAAAGCCGT 
TCTTCAAGGCCTCAAACCAGCTGCAACTCATTCAGGAAGCCTGGATGAAAATGAAGTGGC 
TGCCAATGTTGAAACCAGACCAGAACTCATCACAAGAACTGAAGAAATTCCATTTGAAGT 
TATCAAGAAAGAAAATCCTAATCCCAGCTGGTCAGGAAATATTATCACAGCAGGAGTCAA 
AGGTGAACGAACTCATTACATCTCTGTACTCACTGAAAATGGAAAAACAACAGAAACAGT 
CCTTGATAGCCAGGTAACCAAAGAAGTTATAAACCAAGTGGTTGAAGTTGGCGCTCCTGT 
AACTCACAAGGGTGATGAAAGTGGTCTTGCACCAACTACTGAGGTAAAACCTAGACTGGA 
TATCCAAGAAGAAGAAATTCCATTTACCACAGTGACTCGTGAAAATCCACTCTTACTCAA 
AGGAAAAACACAAGTCATTACTAAGGGTGTCAATGGACATCGTAGCAACTTCTACTCTGT 
GAGCACTTCTGCCGATGGTAAGGAAGTGAAAACACTTGTAAATAGTGTCGTAGCACAGGA 
AGCCGTTACTCAAATAGTCGAAGTCGGAACTATGGTAACACATGTAGGCGATGAAAACGG 
ACAAGCCGCTATTGCTGAAGAAAAACCAAAACTAGAAATCCTAAGCCAACCAGCTCCTGC 
TGAGGAAAGCAAAGCTCTTCCTCAAGATCCAGCTCCTGTGGTAATAGAGAAAAAACTTCC 
TGAAACAGGAACTCACGATTCTGCAGGGACTAGTAGTCGCAGGACTCATGGCCACACTAG 
C AGC C T ATGGAC TC AC T AAAAGAAAAGAAG AC T AAG TCTTTTCG AT AAAAAAT AAAC AGC 
GAGATTGAAGCTCGCTGTTTATTTTTTAATTAATCACCTAGTCCAAGACGTTCAAAGATA 
TCATCCACTCGTTTGGTGTAATAAACTGGGTTGAAGATTTCATCGATTTCTTCTTGTGTG 
AGACGTGATGTTACTTCTGAATCTGCCTCAAGAAGTGGTTTAAAGTCTACTTGGTTGTCC 
CAAGAGTAGGCTGTTTTTGGTTGCACCAAGTCATAGGCTTGCTCACGGGTCATGCCTTTT 
TCAATCAATGTCAACATAGCCCGTTGGCTAAAGATAAGACCAAAAGTCGAGTTCATGTTT 
CGGATCATATTTTCTGGGAAGACTGTCAAGTTCTTGACGATATTTCCAAAACGGTTGAGC 
ATGTAGTCAATCAAAATGGTCGTATCTGGTGTGATGATACGCTCAGCTGATGAGTGAGAA 
ATATCGCGTTCGTGCCAGAGAGCGACGTTTTCATAAGCCGTAATCATGTGACCACGAATG 
ACACGCGCCAGACCAGTCATATTTTCAGAACCGATTGGGTTGCGTTTGTGAGGCATTGCT 
GAAGACCCTTTTTGCCCTTTAGCAAAGAACTCTTCTACTTCGCGTTGCTCAGATTTTTGT 
AGACCACGAATCTCAGTCGCCATACGTTCGATTGAAGTCGCAATGCTGGCAAGAACCGCA 
AAGTACTCAGCGTGAAGGTCACGAGGAAGGACTTGTGTTAAAGATTCCTTGGGCACGGAT 
GCCAAGATTTATCGCAGACATACTCCTCTACAAATGGTGGGATATTGGCAAAGTTCCCAA 
CCGCACCAGAAATCTTACCAGCTTCTACACCAGCAGCCGCATGCTCGAAGCGCTCGATAT 
TGCGTTTCATTTCGCTGTACCAAGTTGCTAATTTAAGACCAAAGGTTGTCGGCTCAGCGT 
GCACACCATGAGTACGCCCCATCATGATGGTGAACTTGTGCTCCTTGGCCTTGTCAGCGA 
TGATATTAGTGAAGTTTTCAAGGTCACGACGGATGATGTCGTTGGCCTGCTTGTAGAGGT 
AACCATAAGCAGTATCCACCACGTCGGTAGAAGTTAACCCATAGTGAACCCACTTGCGCT 
CTTCACCAAGAGTCTCAGAAACCGCACGCGTGAAAGCCACCACATCGTGGCGCGTCTCCT 
GCTCAATTTCCAAAATACGGTCGATGTCAAAGTCCGCCTTCTTGCGAATCAAAGCCACAT 
CTTCCTTAGGGATTTCCCCCAACTCAGCCCATGCCTCGTCAGAGAGGATTTCCACCTCAA 
GCCAAGCACGGTATTTATTTTCTTCACTCCAAATATTCGCCATCTCAGGGCGAGAGTAAC 
GGTTGATCATGTGTTAATTTTTCCTTTCTTCTTAAGAT 
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ORF Predictions: 

ORF # Start End Direction Length 



1 930 1516 R 229 aa 

> 3864180-2 ORF translation from 930-1616, direction R 
VPKESLTQVLPRDLHAEYFAVLASIATSIERMATEIRGLQKSEQREVEEFFAKGQKGSSA 
MPHKRNPIGSENMTGLARVIRGHMITAYE3WALWHERDISHSSAERIITPDTTILIDYML 
NRFGNIVKNLTVFPEmiRNMNSTFGLIFSQRAMLTLIEKGMTREQAYDLVQPKTAYSWD 
NQVDFKPLLEADSEVTSRLTQEEIDEIFNPWYTKRVDDIFERLGLGD* 

Description : 

ADENYLOSUCCINATE LYASE (EC 4.3.2.2) ( ADENYLOSUCC INASE ) <ASL) . - 
BACILLUS SUBTIL IS. 

Assembly ID: 3864184 
Assembly Length: 43 92bp 

> 3864184 Strep Assembly Assembly id#3864184 

CCCTTTTGCCTCTCCCTTTGGTGCAGATTCTTTTGGGAATTGTGATTGGTCTCTTTTTAC 
CCAATACTGACTTTCATCTTAATACGGAGTTGTTTTTGGCCTGGTTATCGGACCCTTGCT 
TTTCCGAGAGGCTGAAGAAGCAGATGTTACGGCTATTTTAAAACACTGGCGAATCATTGT 
TTATCTCATATTTCCAGTGATTTTTATCTCGACCCTGAGTTTGGGTGGCTTGGCCCATCT 
TCTTTGGTTCAGCCTTCCCTTGGCAGCTTGCTTGGCTGTTGGGGCAGCCCTTGGTCCTAC 
GGACTTGGTGGCCTTTGCCTCTCTTTCGGAGCGTTTTAGCTTTCCTAAGCGCGTGTCCAA 
TATTCTTAAGGGCGAAGGACTCTTGAATGATGCTTCTGGTTTGGTGGCTTTTCAGGTAGC 
TTTGACAGCTTGGACAACTGGAGCTTTTTCTCTGGGGCAAGCTAGCAGTTCGCTCATCTT 
TTCAATCCTAGGCGGTTTTTTAATTGGATTTTTAACAGCCATGACCAACCGCTTCCTCCA 
TACCTTCTTGCTAAGTGTGCGCGCAACGGATATTGCCAGTGAACTTTTATTAGAATTCGA 
GTTTGCCTCTAGTGACCTTCTTTCTGGCAGAAGAAGTCCATGTTTCAGGGATTATTGCCG 
TCGTAGTTGATCGAATTTTAAAGGCAAGTCGCTTCAAGAAAATCACGCTCCTCGAAGCCC 
AAGTGGATACGGTGACCGAGACGGTCTGGCATACAGTGACCTTTATGCTCAACGGTTCTG 
TCTTTGTGATTTTAGGGATGGAGTTGGAAATGATAGCAGAACCTATCTTGACCAATCCAA 
TCTATAATCCTCTACTTTTATTGCTATCTCTCATCGCCCTTACCTTTGTCCTCTTTGTCA 
TTCGTTTTATTATGATCTATGGCTATTATGCCTATAGAACCCGACGCCTAAAGAAAAAGC 
TAAATAAGTATATGAAGGACATGTTTCTCTTGACCTTTTCAGGTGTTAAGGGAACGGTGT 
CGATTGCTACGATTCTCTTGATACCAAGTAATCTAGAACAGGAGTATCCTCTCTTGCTTT 
TCCTTGTTGCAGGTGTGACGCTTGTCAGCTTTTTAACAGGTCTCTTGGTCTTGCCTCATC 
TTTCTGATGAAGAGGAAGAAAGCAAGGATTATCTCATGCATATCGCCATTTTGAATGAAG 
TAACGCTAGAGTTGGAAAAAGAGTTGGAAGACACCAGAAATAAACTTCCCCTCTATGCGG 
CTATTGACAATTCGATCATGGACGTATTGAAAATCTCATTTTAAGCCAAGAAAACCAGGA 
TGATCAAGAAGACTGGGCTGCTTTGAAAATCGAATTCTTAGTATTGAAAGTGATGGTTTG 
GAACAGGCCTATGAAGAGGGGAACATTAGCAATCGTGCTTACCGAGTTTACCAACGTTAT 
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CTGAAAAATATAGAACAAGGAATCAATCGTAAACTTGCCTCAAGACTGACCTATTATTTT 
CTTGTTTCCTTGAGGATTTTACGTTTTCTTCTTCATGAAGTTTTTACTCTTGGAAAGACC 
TTCCGTAGCTGGAAGGACAAGGAGCAAAGCCGTCTCCGTGCTCTTGATTATGACCAAATT 
GCAGAGCTCTATCTTGCCAATACAGAGATGATTATTGAAAGTTTGGAAAACCTGAAGGGA 
GTCTACAGACGCTCTTTGATTAGTTTTATGCAGGAGTCTCGTCTTCGAGAAACAGCTATT 
ATCAGCAGTGGTGCCTTTGTCGAACGGGTTATCAATCGTGTCAAACCCAACAATATCGAT 
GAAATGCTGAGAGGCTATTATCTGGAGCGCAAGTTGATTTTCGAATACGAAGAAAAACGA 
TTGATTACGACTAAGTATGCCAAGAAATTACGACAAAATGTAAATAACTTAGAGAACTAT 
TCCTTGAAGGAAGCTGCCAATACCCTGCCGTATGATATGGTGGAATTGGTAAGAAGAAAT 
TAGTTAATACTCTTCGAAAATCTCTTCAAACCACGTCAGCGTCGCCTTGGATTATATATG 
TGACTGACTTCGTCAGTTTCATCTACAACCTCAAAGCAGGGCTTTGAGCAACCTGCGGCT 
AGCTTCCTAGTTTGCTCTTTGATTTTCATTGAGTATAAGATTGTAAGTGAAGGAGTGTGA 
CATGAAAAAATGGGGAAAGAGCCTGAACTAGTCCTGTCTACTTTTACCCAATCACACTTC 
CATTTGGTACAGCTGGATCAACTGTGAGAAGGGATCGAATTTGCCATCATGTTCAGCTGA 
GAGAATCATACCCTGGCTGACATATTTTTTCATCATTTTACGTGGTTTGAGGTTAGCAAC 
GATTTGAACTTTCTTGCCGACCAATTCTTGTTCATTTGGATAGTATTTTGCAATTCCTGA 
AAGAATCTGACGATCTTCTCCATCACCAGCATCCAAGCGGAATTGAAGCAACTTATCTGA 
ACCTTCTACTTTAGACACTTCTTTGACTTCTGCGACACGGATTTCAACCTTGTCAAAGTC 
TTCAAACTTGATTTCATCCTTGTTTAGTTTGAGCTCAACTTCGTCCGGATTCCATTCTTT 
TTCGACTGCTGGTTTATTGCCTTCCATTTGTTCCTTGATATAGGCGATTTCTTCTTCCAT 
ATTTAGACGTGGAAAGATAGGTGTTCCTTTGGCAACTACAGTCACATCTGCTGGGAAGTC 
AGCCAAACTCAAGTTTTCAAGACTAGAAACTTCTTCCAAACCAAGTTGAGTCAAAACTGC 
ACGACTAGTTTCCATCATAAATGGTTCAATCAAGTGAGCAACTACACGAATGCTGGCTGC 
CAAGTGGCTCATGACACTTGCCAATTGGTCACGAAGAGCTTCATCCTTGTCCAAGACCCA 
TGGTGCAGTCTCATCGATGTATTTATTGGTACGAGAGATCAGAGTCCAGACTGCTTCAAG 
CGCACGTGGATAGTCAACTGCTTCCATGTGTGTATGGAAGTCTGCGATTGATTTTTCTGC 
AACCTCAGCAAGAACATGATCAAATTCAGTCACACCTTCTACATAGGCAGGGATTTGTCC 
ATCAAAGTACTTATTAATCATGGAAACCGTACGGTTAAGGAGGTTCCCAAGGTCATTAGC 
C AAT T C AT AG T T GAT AC G AC C G AC AT AG T C T T C AGG AGT AAAG GTTCCGTCT G AAC C AAC 
TGGAAGGTTACGCATGAGGTAGTAACGAAGTGGATCTAGTCCATAACGCTCTACCAACAT 
TTCAGGGTAAACGACATTCCCTTTTGACTTAGACATTTTTCCGTCTTTCATGACAAACCA 
ACCATGGGCAATCAAACGATCAGGTAATTTAACATCCAACATCATAAGAAGGATTGGCCA 
GTAGATAGAGTGGAAGCGAAGGATGTCTTTTCCTACCATATGGAAGACTGTTCCATTCCA 
GAACTTGTCAAAGTTACCATGTTCGTCTTGAGCGTAGCCAAAAGCTGTCGCATAGTTAAG 
AAGGGCATCAATCCAAACGTAGACAACGTGTTTTGGATTTGATGGGACAGGCACTCCCCA 
TGTAAAGGTTGTACGAGATACCGCCAAATCTTCCAAACCTGGCTCGATGAAGTTGCGTAG 
CATTTCATTAAGACGACCATCTGGCGTGATAAATTCAGGATGAGCTTTGAAAAATTCGAC 
CAAACGGTCTTGGTATTTGCTAAGGCGAAGGAAGTATGATTCTTCAGAAACCCATTCAAC 
CTCATGACCTGATGGAGCAATACCACCAGTCACATTTCCAGCTTCATCACGGAAAACTTC 
TGCCAGCTGGCTTTCTGTAAAGAATTCTTCGTCTGATACTGAATACCAACCAGAGTATTC 
ACCCAAGTAGATATCATCTTGAGCAAGTAAGCGTTCAAAGACCTGTGCGACAACTTTTTC 
ATGGTAGTCATCGGTTGTACGGATAAATTTATCGTATGAGATATCTAGTAATTGCCAGAG 
TTCTTTAACTCCAACCGCCATTCCATCAACATAGGCTTGAGGTGTAATACCAGATTCGAA 
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TTCCGCTTTCTGCTGGATTTTCTGACCATGTTCATCAAGACCTGTCAGATAAAATACATC 
GTAGCCCATCAGGCGTTTGTAACGTGCTAGGACATCACATGCGATAGTTGTGTAGGCAGA 
ACCGATATGAAGTTTCCCAGATGGATAGTAAATCGGCGTTGTAATATAAAAATTTTTTTC 
AGACATAATTTTTCCTTTCCAGGCAAATGAAACCTGTTTTTCTAACACTTCATTATATCA 
CATTTTTAATGAATTTCGATAGGGAAATCCATACCAAAACAAGATAGACGAGTGTCCATC 
TTGTTGATCTCATTCATAACGAAGGGCTTCAATTGGATCAAGTTTCGATGCCTTGTTGGC 
TGGCAAGACTCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 197 670 F 158 aa 

2 612 1304 F 231 aa 

> 3864184-1 ORF translation from 197-670, direction F 
VIFISTLSLGGLAHLLWFSLPLAACLAVGAALGPTDLVAFASLSERFSFPKRVSNILKGE 
GLLNDASGLVAFQVALTAWTTGAFSLGQAS S SL I F S ILGGFL IGFLTAMTNRFLHTFLL S 
VRATDIASELLLEFEFASSDLLSGRRSPCFRDYCRRS* 

Description : 
unknown 

> 3864184-2 ORF translation from 612-1304, direction F 
VTFFLAEEVHVSGIIAVWDRILKASRFKKITLLEAQVDTVTETVWHTVTFMLNGSVFVI 
LGMELEMIAEPILTNPIYNPLLLLLSLIALTFVLFVIRFIMIYGYYAYRTRRLKKKLNKY 
MKDMFLLTFSGVKGTVSIATILLIPSNLEQEYPLLLFLVAGVTLVSFLTGLLVLPHLSDE 
EEESKDYLMHIAILNEVTLELEKELEDTRNKLPLYAAIDNSIMDVLKISF* 

Description : 
unknown 

Assembly ID: 3864194 
Assembly Length: 1941bp 

> 3864194 Strep Assembly — Assembly id#3864194 

AATTAGTATTCTCAACCTTTTTATCTTGATAGTTCAAGATGGCATTCGTTGAATTGGTAA 
CATAGTAACTATCCACTCCCTTCAGTTTAGCTGCCTCTTGAACCCAGGATTCTTGCGGTT 
TTGGCGGTTCAACAGGAATTCTTTTTCTTTTCCAGAAACCGTAAAAGCTGATTGTTTCTG 
AGTAAAAGACCCATCTTTACTTTTTTTAGGAGAGAAAAAGACGCTAATATTTTTCTGAGA 
TTTAGTCATATCTTTATTGACTTGACGAGATAGGGAATCACCCAAAGCCATAATCACAAC 
AACTGATGAAACACCGATAATAATCCCAATCATAGTAAGCAAAGAACGCATCTTGTGAGC 
CATGATAGATGAAAAGGCAAATTTCAGATTCTGCATCTTAGTTTTCCTCCTTTCCTAACT 
GAGCACTGTCAGACGAAATGACCCCATCCCGAATGACAATCTGACGTTTGGCATAGGCAG 
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caatctcaggcttcatgcgttaccatgataatggtttttccttctttattcaaatcam:c 
aataattgcataatttggttacctgttttggtatccaaggctcctgtcggttcatccgct 

AGGATAATAGAAGGATTGTTTACCAAGGCACGCGCAATGGCTACACGTTGCTTTTGACCA 
CCAGATAATTCTGAAGGTAAATGGT-GACTACGTTCTATCAATTCAACCTTGTCTAAATAT 
TCCTCAGCCAACTTGCGACGTTTTGAAGACGAAACTCCTGCGTAAATCAAGGGCAATTCT 
ACATTTTGCAGAGCATTGAGCTTCGATAGAAGAAAGAACTGCTGAAAGACAAAACCGATT 
TGTTGGTTACGGACCTTAGCTAGTTGTTTTTCACCAAGCCCAGCCACTTCTTGACCTTCA 
AGATAATATTCTCCACTGGTTGGTGTATCCAACATGCCAATCGTATTCATCAGAGTGGAC 
TTACCAGACCCAGATGGTCCCATGATGGCTACAAATTCACCCTCATTCACTTCTAGATTG 
ATATTTTTGAGAACCTGCAGTTCTTGGTCACCATTACGGTAACTTCTGAAGATATTTTTT 
AGACTAATTAGTTGCTTCATCAGCCTTCACCTCTTTTCCTTCTTCCAAGGAAGATGTTGG 
ATTACTGATGACCTTAGCACCGTTCGTTAAACCAGAAGTGATTTCTTGATTTTCTGCGTC 
AGCATTTCCCAATGAAACCTCAACTTTTTTAGCCTTTTGTTGTTCATCCACAATCCAGAC 
AT AATT TTT AC T ATC AT C CAT T AC TAG AC TG C T AAC AGG AAC AAG AAT AGC C T T AG TTTT 
GCTTTTAACCTCAATGTTGACAGAAAAACCTTGTTTCAAATCACCAACCTCGCCTGTCAC 
ATCAATAGTATAAGGGTATTTAGAACCTGTATTATTCCCGGCTGCTGGACTAGCTGCTTC 
ACCATTGTTTTTAGGATAGTCAGAAATATAGGCTTAATTTCCCAGTCCATTTTTTATCAG 
GATACACTTTAGAAGTAAAGCTTACTTCTTGACCTACAGAAAGGTTGGCTAGATTGTACT 
CAGACAATTCTCCCTTGACTTGTAAATTTTCATTGCTGACAATATGAACCATAACTTGAC 
TCGCCCCTGTTGGAGATTTAGAAACATTGCTATTGACTTCGACTACAGTTCCCTCTAGGG 
TACTGAGAACAGTTGTTGCATCCAATTGACTTTGAGCCTTGCTTAATTGCGCTGCAGCAT 
CTGCACGCGCATCACGGGCATCACCCAATTGAGCATCAATAGAAGCAACAGAATTTCCAG 
CCACTGGAGTTGGGCTTTGCACCGTTGCATCTTCTCCTCCTACTGGCGCTGGTAACTGTG 
GAGCCTGAGCTGAAGCGGCTTCATTTCGTGCTTGATTGAGTTCATTGATATGACGATCTG 
C C TT AGC T AC TG C TC G AC TAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 1084 1380 R 99 aa 

> 3864194-3 ORF translation from 1084-1380, direction R 
VTGEVGDLKQGFSVNIEVKSKTKAILVPVSSLVMDDSKNYVWIVDEQQKAKKVEVSLGNA 
DAENQEITSGLTNGAKVISNPTSSLEEGKEVKADEATN* 

Description : 
unknown 

Assembly ID: 3864338 
Assembly Length: 13 3 5bp 

> 3864338 Strep Assembly -- Assembly id#3864338 

ATCGAATTCCCTATTTTAACACTTTCTTTTCTAAAACAGTCTATATTTTATTTCAAACTG 
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TATTATATTTTTGAAAAAATAAAGTCCTTTTTTCTTTTTTTCAGAAAAAAGGGTATA^TA 
AAAG AAAAT AAGC AGT AAC AC TC AATGGAAATC G AAAAAGC AAAC T AGG AAGC T AGC C GC 
AGATTGCTCAAAACACTGTTTTGAGGTTGCAGATAGAGCTGACGTGGTTTGAAGAGATTT 
TCGAAGAGTATAAAAAGGTGCTAGGCATGTTGATTTTTCCTTTGTTAAATGATTTGTCAA 
GAAAAATCATCCATATTGGACATGGATGCCTTTTTTGCTGCAGTGGAAATCAGGGATAAT 
CCTAAACTCAGAGGAAAACCTGTCATTATTGGAAGCGACCCTCGGCAAACAGGTGGACGG 
GGAGTCGTTTCTACCTGTAGTTATGAGGCAAGAGCTTTTGGTGTCCATTCTGCCATGAGT 
TCCAAGGAAGCTTATGAACGTTGTCCCCAGGCTGTCTTTATCTCAGGGAATTCGATGAGA 
AATACAAGTCTGTGGGACTCCAGATTCGAGCTATTTTTAAGCGCTATACAGATTTGATTG 
AACCCATGAGCATTGACGAAGCCTATTTGGATGTGACAGAAAATAAACTCGGTATCAAGT 
CAGCGGTCAAAATTGCTCGCCTCATTCAAAAAGATATCTGGCAAGAACTCCATCTAACTG 
CTTCCGCAGGCGTTTCTTACAACAAATTCTTAGCTAAAATGGCGAGTGATTATCAAAAAC 
CACATGGTTTGACAGTGATTCTACCTGAACAGGCTGAGGATTTTCTCAAACAAATGGATA 
TTTCCAAATTTCATGGAGTAGGAAAAAAGACAGTAGAACGTCTTCATCAAATGGGCGTTT 
TTACTGGTGCTGATTTACTTGAAGTTCCTGAGGTAACCCTAATAGACCGTTTTGGTAGAC 
TAGGCTATGATCTGTATCGAAAGGCTCGTGGCATTCACAACTCTCCAGTCAAATCCAATC 
ACATCCGTAAATCAATCGGCAAGGAGAAAACCTACGGGAAGATTCTCCGTGCTGAGGAAG 
ATATCAAAAAAGAGAGCTGACTCTTCTATCAGAAAAAGTCGCTCTCAATCTACATCAACA 
AGAAAAAGCTGGAAAAATTGTCATTTTGAAAATCCGCTACGAGGACTTTTCAACTCTTAC 
CAAACGAAAAAGTATTGCTCAAAAAACACAAGATGCTAGTCAGATAAGCCAAATAGCCCT 
GCAACTCTATGAAGAATTAAGTGAGAAAGAAAGAGGTGTCCGCCTATTGGGGATTACCAT 
G AC TGG ATTTT AAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 552 1100 F 183 aa 

> 3864338-2 ORF translation from 552-1100, direction F 
VGLQIRAIFKRYTDLIEPMSIDEAYLDVTENKLGIKSAVKIARLIQKDIWQELHLTASAG 
VSYNKFLAKMASDYQKPHGLTVILPEQAEDFLKQMDISKFHGVGKKTVERLHQMGVFTGA 
DLLEVPEVTLIDRFGRLGYDLYRKARGIHNSPVKSNHIRKSIGKEKTYGKILRAEEDIKK 
ES* 

Description : 

ECODINJ NCBI - Escherichia coli (substrain W3110, strain K-12) 
DinP, DNA damage inducible protein 

Assembly ID: 3864360 
Assembly Length: 1796bp 

> 3864360 Strep Assembly «- Assembly id#3864360 

TCCAAGCTAGCTATTTCGTGGAAGGGGCTTCGGTTGGCAGAACCTGGTGAATTTACCCAA 
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ACGTGCTTTTTTAAACGGTCGCGTAGACTTGACACAGGCAGAGGCTGTGATGGATAT£L^T 
CCGTGCCAAGACTGACAAGGCCATGAACATTGCGGTCAAACAATTAGACGGCTCCCTTTC 
TGACCTCATTAACAATACCCGTCAAGAAATCCTCAATACACTTGCCCAAGTTGAGGTCAA 
TATCGACTATCCTGAATATGATGAT-GTTGAGGAAGCTACTACTGCCGTTGTCCGTGAGAA 
GACTATGGAGTTTGAGCAATTGCTAACCAAGCTCCTTAGGACAGCACGTCGTGGTAAAAT 
CCTTGGTGAAGGAATTTCAACGGCTATCATTGGACGTCCCAACGTTGGGAAATCAAGCCT 
TCTCAACAACCTCTTGCGTGAGGACAAGGCTATCGTAACCGATATCGCTGGGACAACACG 
AGATGTCATCGAAGAGTACGTCAACATCAATGGTGTTCCTCTAAAATTGATTGACACAGC 
TGGTATTCGTGAAACGGATGATATCGTTGAACAAATCGGTGTTGAGCGTTCGAAAAAAGC 
CCTCAAGGAAGCCGACTTGGTTCTACTAGTGCTAAATGCCAGTGAACCACTGACTGCGCA 
AGACAGACAACTTCTTGAAATTAGCCAAGATACCAATCGCATTATTCTACTTAATAAAAC 
CGACCTGCCAGAAACGATTGAAACTTCGAAACTACCTGAAGACGTTATCCGTATTTCAGT 
CCTTAAAAACCAAAACATCGACAAGATTGAAGAGCGAATCAACAACCTCTTCTTTGAAAA 
TGCTGGCTTGGTCGAGCAAGATGCTACTTACTTGTCAAACGCCCGTCACATTTCCCTGAT 
TGAAAAAGCAGTTGAAAGCCTACAAGCCGTTAATCAAGGTCTTGAGCTGGGGATGCCAGT 
TGATTTGCTTCAAGTTGACTTGACTCGTACTTGGGAAATCCTCGGAGAAATCACTGGGGA 
TGCTGCTCCAGATGAACTCATCACCCAACTCTTTAGCCAATTCTGTTTAGGAAAATAAGA 
AAAATCCATGATCCTTCATTCGGTCATGGATTTTATTGTCTTTATTAGTAATCTGGTCTT 
AAGACCCCTGTTACAGTTGCCTTAGTTGCTTCGTAGTCGCCATCTACGACAACCTTGATA 
ATGCGTTTGACATCTTCTTCTGGTGCTGGAACAAGAGGTAGACGAGTGGGTCCAGCTTCA 
AATCCCATATAGTTAAGAATTGCCTTAACTGGAGCAGGACTTGGATAAGAGAAGAGAGCA 
TTAACCTTAGGAATGAATTTACGCTGAATTGCTGCGGCTTTCTTCATATCGCTTTCTGCA 
ATGGCAGTAAACATCTCGTGCATTTCATCGCCATTTGTATGAGAGGCAACAGAAATAACC 
CCATCCGCCCCAAGGTTCATGGCATGGAAAGCATCTCCATCCTCACCTGTATAAATCAAG 
AACTCTTCAGGCTTGTGCTCAATCAAGTAAGCCATATTAGCCAAGCTAGTACATTCTTTG 
ACACCGATAATATTTGGATGGTCAGCCAAGCGAAGCATGGTTTCTGGAGTCAATTCGACA 
ACTACACGCCCTGGAATGTTATAGATAATAATTGGTAGGTCAGAAGCATCTGCAATAGCC 
TTAAAGTGCTGATACATCCCTTCTTGAGAAGGTTTGTTGTAGTAAGGAACAATAGCAAGC 
CCAGCTGCGAAACCACCAAATTCCGCTACTTCTTTGACAAACTCAATAGAGTCACG 

ORF Predictions: 

ORF # Start End Direction Length 



1 47 1078 F 344 aa 

> 3864360-1 ORF translation from 47-1078, direction F 

VNLPKRAFLNGRVDLTQAEAVMDIIRAKTDKAMNIAVKQLDGSLSDLINNTRQEILNTLA 

QVEVNIDYPEYDDVEEATTAWREKTMEFEQLLTKLLRTARRGKILREGISTAIIGRPNV 

GKSSLLNNLLREDKAIVTDIAGTTRDVIEEYVNINGVPLKLIDTAGIRETDDIVEQIGVE 

RSKKALKEADLVLLVLNASEPLTAQDRQLLEISQDTNRIILLNKTDLPETIETSKLPEDV 

IRISVLKNQNIDKIEERINNLFFENAGLVEQDATYLSNARHISLIEKAVESLQAVNQGLE 

LGMPVDLLQVDLTRTWEILGEITGDAAPDELITQLFSQFCLGK^ 
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Description: _ 
THIOPHENE AND FURAN OXIDATION PROTEIN THDF . - ESCHERICHIA COLI . 

Assembly ID: 3864388 
Assembly Length: 23 3 7bp 

> 3864388 Strep Assembly -- Assembly id#3864388 

CTTCGTACAGGTGGTTCCTATGCAAGGGTGGAAGCCAATCGTCAGAACAACAAGCATCTT 
CATCAAGCCAGAACTGGAGCAATTACAAAAAGAAATTGCTGAAGAAGAAGCAAGCTTGGG 
TTCAGAAGAAGTGGCTTTGAAGACCTTGCAAGATGAGATGGCCAGATTGACCGAGTCATT 
AGAAGCTATTAAATCTCAAGGAGAGCAGGCACGTATTCAGGAGCAAGGCTTGTCCCTCGC 
TTATCAGCAAACTAGTCAGCAAGTTGAAGAACTGGAAACTCTTTGGAAACTCCAAGAAGA 
GGAAATAGATCGTCTTTCCGAGGGAGATTGGCAAGCGGATAAGGAAAAATGCCAAGAGCG 
TCTTGCTGCAATCGCCAGTGACAAGCAAAATCTGGAAGCTGAGATTGAAGAGATTAAGTC 
TAATAAAAATGCCATCCAAGAACGCTATCAAAACTTGCAGGAAGAGCTAGCGCAAGCTCG 
TTTGCTTAAGACAGAACTGCAAGGGCAAAAACGTTATGAAATTGCTGATATTGAACGCTT 
AGGCAAGGAATTGGACAATCTTGATTTTGAACAAGAGGAAATCCAGCGCCTTCTTCAAGA 
AAAGGTTGACAATCTTGAGAAGGTTGATACAGAATTGCTCAGTCAACAGGCGGAAGAATC 
CAAAACTCAGAAAACGAACCTCCAACAAGGTTTGATTCGCAAACAGTTTGAGTTGGATGA 
TATAGAAGGTCAGCTGGATGATATTGCTAGTCATTTGGATCAGGCTCGCCAGCAGAATGA 
GGAGTGGATTCGCAAGCAAACACGTGCTGAAGCTAAGAAAGAAAAGGTCAGCGAGCGCTT 
TGCCGCCATCTACAAAGTCAATTAACAGACCAGTACCAGATTAGCCATACTGAAGCTCTA 
GAAAAAGCGCATGAATTGGAAAACCTCAATCTGGCAGAGCAAGAAGTTAAGGATTTAGAG 
AAGGCTATTCGCTCACTGGGTCCTGTCAATATAGAAGCTATTGACCGGTACGAAGAAGTT 
CACAACCGTCTGGACTTTCTAAATAGTCAGCGAGATGATATTTTGTCAGCGAAAAATCTG 
CTCCTTGAAACCATTACAAAGATGAATGATGAGGTTAAGGAACGCTTTAAATCAACCTTT 
GAAGCTATTCGTGAGTCCTTTAAAGTGACCTTCAAGCAGATGTTTGGCGGAGGTCAGGCA 
GACTTGATATTGACTGAGGGCGACCTTTTACAGCTGGTGTGGAGATTTCTGTTCAACCTC 
CAGGTAAGAAAATCCAGTCGCTTAACCTCATGAGTGGTGGTGAAAAAGCCCTATCGGCTC 
TTGCCTTGCTTTTCTCCATTATTCGTGTCAAGACCATTCCTTTTGTCATCTTGGATGAGG 
TGGAAGCTGCGTTGGATGAAGCCAATGTTAAACGTTTTGGGGATTACCTCAACCGCTTTG 
ACAAGGACAGCCAGTTTATCGTCGTAACCCACCGTAAGGGAACCATGGCAGCGGCCGATT 
CCATCTATGGAGTGACCATGCAAGAATCGGGTGTTTCAAAGATTGTTTCAGTTAAGTTAA 
AAGATTTAGAAAGTATTGAAGGATGACAATTAAACTAGTAGCAACGGATATGGACGGAAC 
CTTCCTAGATGAGAATGGGCGCTTTGATATGGACCGCCTCAAGTCTCTCTTGGTTTCCTA 
CAAGGAAAAAGGGATTTACTTTGCGGTGGCTTCGGGTCGGGGATTTCTGTCTCTGGAAAT 
CGAATTATTTGCTGGTGTTCGTGATGACATTATTTTCATCGCGGAAAATGGCAGTTTGGT 
AGAGTATCAAGGTCAGGACTTGTATGAAGCGACTATGTCTCGTGACTTTTATCTGGCAAC 
TTTTGAAAAGCTGAAAACGTCACCTTATATAGATATCAATAAACTGCTCTTGACGGGTAA 
GAAGGGTTCATATGTTCTAGATACGGTTGATGAGACCTATTTGAAAGTGAGTCAGCATTA 
TAATGAAAATATCCAAAAAGTAGCGAGTTTGGAAGATATCACAGATGACATTTTCAAATT 
TACAACCAACTTCACAGAAGAAACGCTAGAAGCTGGTGAAGCTTGGGTCAATGATAATGT 
CCCTGGTGTCAAGGCTATGACAACTGGCTTTGAATCTATTGATATTGTTCTGGACTATGT 
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CGATAAGGGTGTAGCTATTGTTGAATTAGCTAAAAAACTTGGCATCACAATGGATCAQST 
CATGGCTTTTGGAGACAATCTTAATGACTTACATATGATGCAGGTTGTGGGACATCCTGT 
AGCTCCTGAAAATGCACGACCAGAGATTTTAGAATTAGCATAAGACTGTGATTGGTC 

ORF Predictions: 

ORF # Start End Direction Length 



1 1239 1586 F 116 aa 

> 3864388-3 ORF translation from 1239-1586, direction F 
VEISVQPPGKKIQSLNLMSGGEKALSALALLFSIIRVKTIPFVILDEVEAALDEANVKRF 
GDYLNRFDKDSQFIWTHRKGTMAAADSIYGVTMQESGVSKIVSVKLKDLESIEG* 

Description : 

P115 protein - Mycoplasma hyorhinis (SGC3) (similarity to 
SMC1_YEAST, chromosome segragation protein) 

Assembly ID: 3864406 
Assembly Length: 2162bp 

> 3864406 Strep Assembly -- Assembly id#3864406 

C T AAAAGTG AAGC C C G AT AGCG TC TC TC TC C TGC AAGG ATTTC AT AAC C AAT AAC AGGAG 
ATTGACGAACAATAATCGGTTGAATGACCCCATTTTCTTTGATAGACTGTGCTAGTTCAT 
CTAGCTTTTCTCTATCAAATTCTTTTCGGGGTTGATAGGGATTTTTTTGTATATCTGTGA 
TAGAAATCATTTCAAATTTTTCCATGATTCTACACTAACACATCTTTTCTCTTATGTAAA 
GCTTTCTTTACATAGATGTCAATTAAGATTCTAAATCACCTGAACTCTTGTTAAGTTTGA 
TAGAGGTAGTTTCTTCTTTCCCGTTACGATAGTAGGTTATCTTAATGGTGTCTCCGATAG 
AATGGTTGTAAAGAGCACTTTGTAAGTCTGTTGATGAAGCAATCTCTTTGTCATCTACTT 
TTGTAATTACATCGTATTTTTCAAGGTGACCATTGGCAGGCATATTACTTTGTACCGAAC 
GAACAATTACACCAGATGTAACATTACTTGGAATATTGAGTCTTCTGATGTCGCTTGTAC 
TCACATTAGATAAATTAACCATCTGGATTCCCAAAGCTGGACGCGTCACTTTTCCGTTTT 
TTTCTAACTGTTCAATAATATTGATAGCATCATTTGCAGGAATTGCGAAACCAAGACCTT 
CTACAGATGTTCCTCCATTTGTAGCAATTTTACTTGAGGTAATTCCGATAACCTGCCCTT 
GAATATTGATCAGTGGGCCGCCAGAGTTACCTGGGTTAATAGCAGTATCAGTTTGGATGG 
CTTTTGTAGAAATAGCTTGTCCATCTTCCGATTTTAAGGATACATTTCTATTGAGACTGG 
ATACGATACCTTGAGTGACAGTATTTGCATATTCAGAACCTAACGGGCTACCGATGGCAA 
TAGCAGTTTCTCCTACAGTTAACTTACTAGAATCACCAAACTCAGCTACTGTTGTCACTT 
TTTCTGAAGAGATTTCGACGACAGCAATATCAGAGAAAGTGTCAGCTCCGACAATTTCTC 
CAGGTACTTTAGTCCCATCTGACAATCGAATATCTACTTTGCTGGCGCCATTTATAACGT 
GATTGTTGGTGACGATGTAAGCTTCTTTATCATTCTTTTTATAAATAACTCCAGATCCTT 
CACTAGAGATTCGCTGAGAATCTGTGTCAGTATCATCATTGCCAAATACGCTATTTTGTC 
TGTTTGCCGAATAAGTAATAACAGAAACAACAGCATCTTTTACTTTGTTAACGGCCTGTG 
TTGTTGAATTTTCCGTTCCTTATAGGCAGTTTGTGTAATAGTACTATTGTTGTTAGAGTT 
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GTTTACACTACTTTTTTGAGTTAGTTGAGTTATTGAAAAACTACCCAAGGCTCCACTAAA 
AAAGCTAATGACGATAACGACTAATAATTGAAACCATTTTTTGTAAAATGTTTTTAGATG 
TTTCATATTTGCCTCCATATGTTTGAATTACTGAAAGTATAAACTGACTAGCTTAATTAT 
AACTTAAACACAAAAGTTTTACACAAACTGTGGATAACTCTTTTGAAACTGTGATTTTCT 
TAATTGAAATCTATTTTTTATTTTGTGAATAAGATGTGAAAAAATAGAGAATATGTTAGA 
ATAGAGTCATGAAAATTAAAGTTGTAACAGTTGGGAAACTGAAAGAAAAGTATTTAAAAG 
ATGGTATCGCAGAGTATTCAAAACGAATTTCTAGATTTGCTAAGTTTGAAATGATTGAGT 
TATCAGATGAAAAAACACCAGATAAGGCCAGTGAATCAGAAAATCAAAAGATTTTAGAAA 
TAGAAGGTCAGAGAATTTTATCAAAAATTGCTGACCGTGATTTCGTTATTGTGTTAGCCA 
TTGAAGGGAAAACTTTCTTCTCAGAAGAATTTAGTAAGCAGTGAGAAGAAACTTCTATAA 
GGAAGGATGTCTACTCTTACTTTTATTATTGGGGGAAGTTTAGGATTGTCATCATCTGTA 
AAAAATAGAGCCAATCTTTCTGTCAGTTTTGGTCGCCTAACCTTGCCTCATCAGTTAATG 
AGACTAGTTCTTGTTGAACAAATCTATCGCGCTTTTACGATTCAGCAGGGATTCCCCTAC 
CATAAATAGAGAATTGACTTTTAATTGAATTTTTGGTAGAATAATTGTGTTAGGTCTCAT 
AG 

ORF Predictions: 

ORF # Start End Direction Length 



1 263 958 R 232 aa 

> 3864406-1 ORF translation from 263-958, direction R 
VTTVAEFGDSSKLTVGETAIAIGSPLGSEYANTVTQGIVSSLNRNVSLKSEDGQAISTKA 
IQTDTAINPGNSGGPLINIQGQVIGITSSKIATNGGTSVEGLGFAIPANDAINI1EQLEK 
NGKVTRPALGIQMVNLSWSTSDIRRLNIPSW^ 

VDDKEIASSTDLQSALYNHSIGDTIKITYYRNGKEETTSIKLNKSSGDLES* 
Description : 

Bacillus subtilis (strain 168, ) DNA. Homologous to E. coli 
serine protease HtrA (BLAST) 

Assembly ID: 3864452 
Assembly Length: 17 6 6bp 

> 3864452 Strep Assembly -- Assembly id#3864452 

ATCGAATTTTCCAAAATGGGGAGCTAGAGCAGTGGAGTGATTATGTGGCAGACGATTTGA 
TTCAGCATAATCATGAGATTGGACAAGGAAGTGCTGCTTATAAAAACTATGTGGCTGAAT 
ATATTGTCACTTTTGACTTCGTTTTCCAACTCTTAGGACAAGGAAACTATGTGGTTAGCT 
ATGGTCAGACTCAGATTGATGGCGTTGCTTATGCCAAGTACGATATCTTCCGTTTAAAGA 
ACGGGAAAATTGTGGAGCATTGGGATAATAAGGAAGTCATGCCTAAGGTAGAAGACTTGA 
CCAATCGAGGGAAGTTTTAAATTGAGGACAAAGAATGATTGAATACAAAAATGTAGCACT 
GCGCTACACAGAAAAGGATGTCTTGAGAGATGTCAACTTACAGATTGAGGATGGGGAATT 
TATGGTTTTAGTAGGGCCTTCTGGGTCAGGTAAGACGACCATGCTCAAGATGATTAACCG 
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TCTTTTGGAACCAACTGATGGAAATATTTATATGGATGGGAAGCGCATCAAAGACTA^3A 
TGAGCGTGAACTTCGTCTTTCTACTGGTTATGTTTTACAGGCTATTGCTCTTTTTCCAAA 
TCTAACAGTTGCGGAAAATATTGCTCTCATTCCTGAAATGAAGGGGTGGAGCAAGGAAGA 
AATTACGAAGAAAACAGAAGAGCTT-TTGGCTAAGGTTGGTTTACCAGTAGCCGAGTATGG 
GCATCGCTTACCTAGTGAATTATCTGGTGGAGAACAGCAACGGGTCGGTATTGTCCGAGC 
TATGATTGGTCAGCCCAAGATTTTCCTCATGGATGAACCCTTTTCGGCCTTGGATGCTAT 
TTCGAGAAAACAGTTGCAGGTTCTGACAAAAGAATTGCATAAAGAGTTTGGGATGACAAC 
GATTTTTGTAACCCATGATACGGATGAAGCCTTGAAGTTGGCGGACCGTATTGCTGTCTT 
GCAGGATGGAGAAATTCGCCAGGTAGCGAATCCCGAGACAATTTTAAAAGTGCCTGCAAC 
AGACTTTGTAGCAGACTTGTTTGGAGGTAGTGTTCATGACTAATTTAATTGCAACTTTTC 
AGGATCGTTTTAGTGATTGGTTGACAGCTACAATGACATTGGTCGGTTCCTTGAGCAAGA 
GATAGATTAGCCAGACAGTCATGCCCAAAATCCCTCCAGGTAAGAGCATAGACCGTTGCA 
CATTAAGTACGATTAAAAAAGTGATAATGGCAAGAAAACTTGCTACTGCTTGTAATAAAA 
AGGTTGTTAGTGTCATATTAGTTCATCAATACCAAGGCGACAGAAGTTCCTGCCCCTAAA 
GCGAGGGTAATGAGCAGGGATTCAAACATCTTACTCATACCAGAGTTTATGTGGTTGGTC 
ATAATATCACGGACCGCATTGGTCAAGGCAATACCTGGTACAAACGGCATGACCGCACCA 
GCTATAATCAAATCTGCCGTTGAAGGAAAACCTGTGTAGCGAGCCCAAAACTGGGCAATT 
ATCCCAAAGACAAAAGCTCCAGCAAAGGCTGTCACAAAGGGAATTCGGATAAATTTTTCC 
ACATAGAGGGAAAAGGCAAAACCAAATAAGGTCGCCACTCCTGCCCCAAGTGCGTCGTAG 
ATATTTCCGCTAAACATAACTGAAAAGAAAGGAGCACTAAAGGTCGCAGCCAGAGTTACC 
TGCAACTTAGTATAGGGAAGGGGTTGAGCTTGCAAGGCCGTCAATTGCTTAAAGGCTGTT 
TCTAAGTCAATCTGCCCCCCAACTGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 1079 1201 R 41 aa 

> 3864452-2 ORF translation from 1079-1201, direction R 
VQRSMLLPGGILGMTWLIYLLLKEPTWIVAWQSLKRS* 

Description : 
unknown 

Assembly ID: 3864458 
Assembly Length: 17 05bp 

> 3864458 Strep Assembly -- Assembly id#3864458 

CTCTGACGGAGGCTGGTTATGTGGGTGAGGATGTGGAAAATATACTCCTCAAACTCTTGC 
AGGTTGCTGACTTTAACATCGAACGTGCAGAGCGTGGCATTATCTATGTGGATGAAATTG 
ACAAGATTGCCAAGAAGAGTGAGAATGTGTCTATCACACGTGATGTTTCTGGTGAAGGGG 
TGCAACAAGCCCTTCTCAAGATTATTGAGGGAACTGTTGCTAGCGTACCGCCTCAAGGTG 
GACGCAAACATCCACAACAAGAGATGATTCAAGTGGATACAAAAAATATCCTCTTCATCG 
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TGGGTGGTGCTTTTGATGGTATTGAAGAAATTGTCAAACAACGTCTGGGTGAAAAAGT£A 
TCGGATTTGGTCAAAACAATAAGGCGATTGACGAAAACAGCTCATACATGCAAGAAATCA 
TCGCTGAAGACATTCAAAAATTTGGTATTATCCCTGAGTTGATTGGACGCTTGCCTGTTT 
TTGCGGCTCTTGAGCAATTGACCGTTGATGACTTGGTTCGCATCTTGAAAGAGCCAAGAA 
ATGCCTTGGTGAAACAATACCAAACCTTGCTTTCTTATGATGATGTTGAGTTGGAATTTG 
ACGACGAAGCCCTTCAAGAGATTGCTAATAAAGCAATCGAACGGAAGACAGGGGCGCGTG 
GACTTCGCTCCATCATCGAAGAAACCATGCTAGATGTTATGTTTGAGGTGCCGAGTCAGG 
AAAATGTGAAATTGGTTCGCATCACTAAAGAAACTGTCGATGGAACGGATAAACCGATCC 
TAGAAACAGCCTAGAGGTGACTATGGAACTTAATACACACAATGCTGAAATCTTGCTCAG 
TGCAGCTAATAAGTCCCACTATCCGCAGGATGAACTGCCAGAGATTGCCCTAGCAGGGCG 
TTCAAATGTTGGTAAATCCAGCTTTATCAACACTATGTTGAACCGTAAGAATCTCGCTCG 
TACATCAGGAAAACCTGGTAAAACCCAGCTCCTGAACTTTTTTAACATTGATGACAAGAT 
GCGCTTTGTGGATGTGCCTGGTTATGGCTATGCTCGTGTTTCTAAAAAGGAACGTGAAAA 
GTGGGGGTGCATGATTGAGGAGTAATTTAACGACTCGGGAAAATCTCCGTGCGGTTGTCA 
GTCTAGTTGACCTTCGTCATGACCCGTCAGCAGATGATGTGCAGATGTACGAATTTCTCA 
AGTATTATGAGATTCCAGTCATCATTGTGGCGACCAAGGCGGACAAGATTCCTCGTGGTA 
AATGG AAC AAGC ATG AAT C AG C AAT C AAAAAG AAAT T AAAC T T T G AC C C AAG TG AC GAT T 
TCATCCTCTTTTCATCTGTCAGCAAGGCAGGGATGGATGAGGCTTGGGATGCAATCTTAG 
AAAAATTGTGAGGAAAAGAAAATGGCAAAAACAATTCATACAGATAAGGCCCCAAAGGCT 
ATCGGGCCCTATGTTCAAGGAAAAATCGTTGGCAACCTTTTGTTTGCTAGCGGTCAAGTT 
CCCCTATCCCCTGAAACTGGGGAAATTGTAGGAGAGAATATCCAAGAACAGACAGAGCAA 
GTCTTGAAAAACATCGGTGCTATTTTGGCAGAAGCAGGAACAGACTTTGACCATGTTGTC 
AAAACAACTTGTTTCTTGAGCGATATGAACGACTTTGTTCCTTTTAATGAGGTTTACCAA 
ACGGCCTTCAAAGAGGAATTCCCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 797 1105 F 103 aa 

2 1179 1391 F 71 aa 

> 3864458-2 ORF translation from 797-1105, direction F 
VTMELNTHNAEILLSAANKSHYPQDELPEIALAGRSNVGKSSFINTMLNRKNLARTSGKP 
GKTQLLNFFN1DDKMRFVDVPGYGYARVSKKEREKWGCMIEE* 

Description : 
unknown 

> 3864458-3 ORF translation from 1179-1391, direction F 
VQMYEF LK Y YE I PVI IVATKADK I PRGKWNKHE S AI KKKLNFDP S DDF I LF S S VS KAGMD 
EAWDAILEKL* 



Description : 
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HYPOTHETICAL 22.0 KD PROTEIN IN LON-HEMA INTERGENIC REGION 
(ORFX) . - BACILLUS S UBTILIS . 

Assembly ID: 3864474 
Assembly Length: 167 3bp 

> 3864474 Strep Assembly Assembly id#3864474 

ACGTTTTGGGAACTGTTCGGATAGCAGATTCCGAACAAACTGATAATGGTTGGCAAAATC 
ATTATTCCTAATAGTAACGAAGCTGGTTAGGACAACTCATGCCATTTCCTAAAAAGGTTT 
TAATCCAAGGCACCAATAATTGTAGGCCGAAAAAACCATAAACAATAGATGGAATGGCTG 
CCATCAAGTTGATAGCTGATTTTAAGAAGCTATAGACGGGCTTTGGACAATTATAAACCA 
TAAACACCGATGTCAAGATCGCCTGTTGGCACCCCAATCACAATCGCTCCTAAGGTCGAA 
TAAATAAGGAACCAACGATCATTGGTAAAATACCATAGCTTGCCGGAATGTTCGTTGGCG 
ACCAATCACTGCCTAATAAAAAACGGGCAAAGCCGTAGTTAGCTATGAAAGGTAAGCCAT 
T AC T AAAAAT AAAG AAAC AG ATT AGC AAAAT AGC T AC AAC AG C T AC TG TTG C AC TC ATGA 
AAAAAATTGCCCTAAAAACTGCTTCTTTGAAGGCTTGTTTTGTCACATCTTGTCCTTTCT 
AGTGAAGAAAGTAAGGGAGATACGACACCTCCCTACTTGCCTTCTTTATCTTATTGTACG 
ATGAAACGTCTGCATCTCTTTAGAGATTTATGGAGCAAACATTTTATTTAATCTTGTCCC 
AGGTGGTTAATTTGCCACTAAAAACGTCCGCAAGTTCAGCCATACTGACTTGGCTTGCCT 
TATTGTCATTATTGACCACAACAGCAATACCGTCTAAAGCAATAGCATCATGGGTGAGAC 
TCTTACCTTCTTCAGGAGTTAATTCCCTAGAAACCATACCAATATCAGCGGTTTTCTCCT 
TAACAGCGGTAATACCTGCTGAAGACCCATTAGAGGTAATATCAATCGTAACTTCTGGAT 
TTTCTTTTTTATAAGCTTCTGCTAATTTTTCCATTAAAGAAGATACTGAAGTGGAACCTA 
CAACAGACAACTTGCCTGATAAGTGTTGGCTTGTATATTCTGTGGTTTCGGTTTTAGCTT 
CAATAAATTTATTATCTGTGACCACTTGTTGACCTTGTTTGGAGTGGATAAAGCTGATAA 
AATCTTGACCTAGCTTGGAAAGATTAGAAGACCAAACAATGTTGAAGGGACGTTGAAGAG 
GGTATTCACCATCTAAAACTGTGTCTCGACTAGCCTTGACACCATCAATCTCTAAAGCCT 
TGACAGATTTCGTTAAAGATCCCAAGGAGATGTAGCCGATAGCATTAGCATTCCCTTGAA 
CTGCTGAGAGAACACCTTCTGTACTATTTTGAATCACAGCTGTTTTGGCAGTGTAGTCAA 
TTTTTTTATCACCGTCTTTTTTGAGAATCCCTGTGATTTCTGTGAAGGCACCCCGTGTTC 
CAGAGCCATTTTCTCGTGAAATCACCTCAATCGTTCCTGGAGCTGACTGTTTGGAAGCAG 
CTGACTGATTGCCACAGGCAACAAGCCCAAATCCTGATAAGCCAATGGCTGCAAGAGTAA 
GCATTTTTTTGAATTTCATAATAATCACCTTTATCTCTATGTATTTTTCTTGTGTAGGCT 
TACTACATTTATAGTCTAACAAGTCTTTGTAAAGGTTTATCCCTGATTCATGTAAAGATT 
GTGTAAAGAATCAAAAAAAGCCACTTTTGAAAAATGGCTGCCCCTAAAAATAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 68 247 R 60 aa 

2 644 1528 R 295 aa 

> 3864474-1 ORF translation from 68-247, direction R 
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VFMVYNCPKPWSFLKSAINLMAAIPSIVYGFFGLQLLVPWIKTFLGNGMSCPNQLRYi^ 
Description : 

PROBABLE ABC TRANSPORTER -PERMEASE PROTEIN (ORF72). - - BACILLUS 
SUBTILIS. (BLAST) 

> 3864474-2 ORF translation from 644-1528, direction R 
VIIMKFKKMLTLAAIGLSGFGLVACGNQSAASKQSAPGTIEVISRENGSGTRGAFTEITG 
ILKKDGDKKIDYTAKTAVIQNSTEGVLSAVQGNANAIGYISLGSLTKSVKALEIDGVKAS 
RDTVLDGEYPLQRPFNIVWSSNLSKLGQDFISFIHSKQGQQWTDNKFIEAKTETTEYTS 
QHLSGKLSWGSTSVSSLMEKLAEAYKKENPEVTIDITSNGSSAGITAVKEKTADIGMVS 
RELTPEEGKSLTHDAIALDGIAVWNNDNKASQVSMAELADVFSGKLTTWDKIK* 

Description : 

probable hemolysin precursor - Streptococcus agalactiae (strain 
74-360) 

Assembly ID: 3864510 
Assembly Length: 17 02bp 

> 3864510 Strep Assembly -- Assembly id#3864510 

CTTTTTTATTTCACAACAAGTTCATAACGTGTCTTACTGGTGAAGGTTTGACCAGCTTTA 
AGAATGACTTGGCCTTTAAGGTCACTGTGAATGGCATCTGGTAAAGCTTGCGCTTCAAGA 
GCAATCCCATTGTGCTGTAGCATTGGCTGACCTCCTATGATGACACTTTCATCCACAAAG 
TTTGCTGTGTAGACCACAAAGCAAGGAGCTTCTGTCTTGAAAAGCAGGAAGCGACCTGAA 
TTTTGGTCATAAAGGAATCCAGCATTGTCATGGCCTGCAGGAAGGGCAAATGGATGATCC 
AAACCTGATGCCAGCTGGATTTGCTCATCTTCTTCTGCAAAGATATCCTTCAACAAGGCA 
CCATTGTAGATGTGTTTGACCACATCACGGTTGGCTTCTGGAGTTTTGGCAGGAACACCG 
TCAGGAGCGATTGAGTAAATGCCCTCTGTGTTTAGTTGGAAGACATGACGGTCAATCGTC 
TGCGTGAAATCACCAGACAAGTTGAAATAGCTGTGGTTGGTTGGATTGACCAGCGTATCC 
TGATCGGTCGTTACCTTGTAGATCGAATTCATGGAGGCACCAGTTTCTTCCAAGTGATAA 
CTGATCGCCAAATCTTGAGATTTCCAGGGAACCCTCCTGTCCCATCTGTACGCTCTGTGT 
AGAGAGTCAAGCCATGATCGCTTACTTCTTCAACTTCAAACAAGCTGGAATCCCAACCAG 
TTGAACCACTGTGATTACAGTTGCTAGCATTATTAACCTCAAGGTCATAGGTCTTACCAT 
TGAGCTCAAAGGTCGCACCTGCAATACGACCCGCTACAGGACCTACACTTGCTCCATGCT 
TGGG AC T ATTGC C TAC ATAACT ATC AAAGTC ATC AAATC C C AAGATAAC ATTGGC AAAAT 
TTCCAGCCTTGTCAGGTGCGACATAGCGCAAGATAGTCGCACCATAAGTCATAACCTCAA 
GTTGGTAGCCACCGTCTGTCTCAAATCGATAGGCCAAGACATCCTCACCCTCAACATTTC 
CAAATACACGCTCTGTGTATGCTTTCATTCTGTTCTCCTTTTACTATTTCTCTCAAGCAA 
ACAAACCATAGAAAGCGTACTGACAATCTATGGTTTATCTGATAATTTACAAATCCTCTT 
GTCAAGAATTCATAAACACTGTCTTACTTTTGATATTCGTGAATTATGACACCTTGTACT 
ACACGGTTTACTGTACCTGTAGGAGACGGTGTATCTGGTTTATTTTCTACCTTGAGTGAA 
GTCAATAGGGCAAAGAGTTGGGCATAAACGATGTAAGGGAAGACACGGTAAATATCATTC 
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AAGACACCGCCACAACCAAGGGCCACTTCTTTGACATTTTCAAGACCAAAAGCTTGA^CA 
CTCAAAAGCACAACACGACGAGCAATCTGGTCACCAGCAACTTCACGAACCAAGTCCAAG 
TCGTACTTACGAGTGTAGTCCGTCGTTGTACCAAAGACCAAAACAACTGTATTGTCGTTG 
ATAAGAGATTTTGGACCGTGACGGAAGCCAACTGGGCTTTCATACATGGTCGGAACTTGA 
CCAGCAGTTAATTCCAAAATCTTGAGCTGAGCTTCATGAGCAAGTCCAAAGAAAGGACCA 
GCGCCTAGAATAGATGACACGGTTAAAGTCTAAATCAACGAGATCTTTGACATCTTCTGC 
CTTGTCTAAAACTTTACGGGCA 

ORF Predictions: 

ORF # Start End Direction Length 



1 1164 1640 R 159 aa 

> 3864510-3 ORF translation from 1164-1640, direction R 
VSSILGAGPFFGLAHEAQLKILELTAGQVATMYESPVGFRHGPKSLINDNTWLVFGTTT 
DYTRKYDLDLVREVAGDQIARRWLLSDQAFGLENVKEVALGCGGVLNDIYRVFPYIVYA 
QLFALLTSLKVENKPDTPSPTGTWRWQGVIIHEYQK* 

Description : 

AGAS PROTEIN. - ESCHERICHIA COLI . (Probable tagatose- 6 -phosphate 
ketose/ aldose isomerase) 

Assembly ID: 3864526 
Assembly Length: 194 Obp 

> 3864526 Strep Assembly -- Assembly id#3864526 

TGCAGGATTTGATTTGGACGACTTTTATTATTACCAGATTCGCCTAGGAATAGAAAAAAG 
AGCCCAAGAGTTGGACTATGATATCTTGCGCTATTTTAATGACCACCCTTTTACCCTAAG 
CGAGGAAGTGATTGGGATTCTCTGCATCGGAAAGTTTAGTCGAGCTCAGATTTCTGCCTT 
TGAAGAATACCAAAAGCCTCTTGTATTTCTAGACAGCGATACACTTTCCCTGGGACATAC 
CTGTATTATCACGGATTTTTACACTGCTATGAAACAGGTTGTCGATTATTTCCTCAGTCA 
AGGAATGGACCGTATCGGGATTCTAACAGGCCTTGAAGAAACAACAGACCAAGAAGAAAT 
CATTCAGGACAAGCGTCTAGAAAACTTCAAAAACTACAGTCAAGCGAGGGGAATCTATCA 
TGATGAACTGGTCTTTCAAGGAAGATTTACTGCCCAGTCTGGCTATGACTTAATGAAGGA 
GGCCATTCAGAGCTTGGGAGACCAACTTCCGCCAGCATTTTTCGCAGCCAGCGATAGTTT 
AGCTATCGGTGCCCTCCGTGCCCTCCAAGAAGCTGGAATCAGCCTGCCAGATCGCGTCAG 
CCTCATTTCCTTTAACGACACTAGTCTGACCAAACAGGTCTATCCTCCCCTCTCTAGTAT 
TACAGTTTATACTGAAGAAATGGGCCGAGCAGGTATGGATATTCTTAACAAGGAAGTCCT 
CCACGGTCGGAAAATCCCTAGCCTGACCATGCTGGGAACCAGACTGACATTAAGAGAAAG 
TACCCTAAATCAAGAATAGGATAACATAAAAAACGAATAGAGTTCTAAAACTCCTATTCG 
TTTTTTATTCGATTACAATCATAGACTTAATGGTCTTACGTTCATCCATATCTTTGTAGG 
CTTGGTCGATATCTTCCAGTTTATAACTTGAAGTAAAGACGCGACCTGGATTGATATCAC 
CATCAAGGACGGCTTTTAGTAAAAATTGCTTATCGTATGTTGTAGCAGAAGCTGCCCCAC 

90 



WO 98/19689 



PCT/US97/19226 



CTGCTACAGAGATATTTTGCATAAATGTCGAACCAAGAGCACGATTATTATAGTGTGSGA 
CTCCTACAAAGCCCATACGCCCTCCATTATGAAGAACACCTAGCGCCTGTTCTATAGCAG 
CCTCCGTACCAACACATTCAAGTGCTGCGTCTGCTCCTCCGCCGAGGATTTCACGCACCT 
TGGTAATTCCTTCTTGACCACGTTG-TGCAACAACAGCTGTCGCACCTGACTCGATAGCCA 
TCTTTTGACGGTCTTCATGACGGCTCATAAGGATAATTTGTGATGCTCCACGCATCTTAG 
CCGCGATGACAGCACATTGACCAACAGCCCCATCACCGATAACAACAACCTTGTCCCCTT 
TTTGAACATTTGCAACACGCGCCGCATGATAGCCTGTCGGCATGACATCTGCAAGAGTCA 
AAAGGGACTTGAGCATCCCTTCTGTATAGTCAGAAGGTTGACCAGGGATTTTAACCAGCG 
CCCAGTTTGCATAGTGGAAGCGAATATATTCTGCCTGAAAATCACCCCCCAAATTATTGC 
CAATATGATTGTCGCAAGAACCGTCAAATCCAGCAAGACAGGCATCACACTCACCACATC 
CATGTGTAAAAGGGACAATCACAAAATCACCTGGTTTCACCGTCGTAATGGCTTCCCCAG 
CTTCTTCAACAATCCCAATCGCTTCGTGTCCACTTATTTTTTGTGTCCAACTTTCGTTTT 
CCNTGGATTACGGTACCTCCATAAATTTGAACCACAAACGCACGCACGAACCACACGAAT 
AATCACATCATCCGCTTCTATTATTTGCGGACGTTCAATGCTAGCAAGTCCAACCTGACC 
TGCCTTTGTATATACTGCTGATTTCATTTAAAATTTTCCTTCCTTATAAAGTTTAATTTT 
GAG AT T T AAA C GAT T T AAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 845 1660 R 272 aa 

> 3864526-2 ORF translation from 845-1660, direction R 
VKPGDFVIVPFTHGCGECDACLAGFDGSCDNHIGISnNrLGGDFQAEYIRFHYANWALVKIPG 
QPSDYTEGMLKSLLTLADVMPTGYHAARVAJWQKGDKVWIGDGAVGQCAVIAAKMRGAS 
QIILMSRHEDRQKMAMESGATAWAERGQEGITKVREILGGGADAALECVGTEAAIEQAL 
GVLHNGGRMGFVGVPHYNNRALGSTFMQNISVAGGAASATTYDKQFLLKAVLDGDINPGR 
VFTSSYKLEDIDQAYKDMDERKTIKSMIVIE* 

Description : 

ALCOHOL DEHYDROGENASE (EC 1.1.1.1). - ALC AL I GENE S EUTROPHUS . 

Assembly ID: 3864548 
Assembly Length: 2051bp 

> 3864548 Strep Assembly Assembly id#3864548 

ATCGAATTTTTCTAGCCAGGCTACAGTTTTGGCAAGTAAGGTTTCATCTCAGGCAGTCAA 
CTGGGTGAGTGCCTTTATTAGCGGAGCTTCTCAAGTGATTGTTGCCTTGATTATCGTTCC 
TTTCATGCTCTTTTATCTCTTGCGTGATGGGAAAGGCTTGCGTAACTATTTGACCCAATT 
CATTCCAAGAAAATTGAAGGAACCTGTTGGACAAGTTCTATCAGATGTGAATCAACAGTT 
GTCCAACTATGTTCGAGGGCAAGTGACAGTGGCTATTATTGTAGCAGTAATGTTTATCAT 
CTTCTTCAAGATTATTGGTCTACGCTATGCGGTTACGCTGGGGGTTACTGCTGGTATTTT 
AAATCTGGTCCCTTATCTTGGTAGCTTTCTAGCCATGCTTCCTGCCCTAGTATTGGGTTT 
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GATTGCTGGTCCAGTCATGCTTTTGAAAGTAGTGATTGTCTTTATTGTAGAACAAACTAT 
TGAAGGCCGTTTTGTCTCTCCATTGATTTTGGGAAGTCAATTAAACATCCACCCTATTAA 
TGTTCTCTTTGTTTTGTTAACTTCAGGATCTATGTTTGGTATCTGGGGAGTTTTACTTGG 
TATTCCGGTTTATGCCTCTGCTAAGGTTGTCATTTCAGCCATTTTCGAATGGTATAAGGT 
AGTCAGTGGTCTATATGAATTAGAGGGTGAGGAAGTCAAGAGTGAACAATAGTCAACAGA 
TGTTACAGGCTTTGGAGGAGCAAGATTTAACTAAGGCTGAGCATTATTTCGCCAAAGCTT 
TAGAAAATGATTCAAGTGATCTTCTGTATGAGTTGGCAACTTATCTTGAAGGGATTGGTT 
TCTATCCTCAGGCCAAGGAAATTTACCTGAAAATTGTAGAAGAATTTCCAGAGGTTCATC 
TTAATCTAGCTGCAATGGCTAGCGAGGATGGTCAAATAGAAAAAGCCTTTAACTATCTTG 
AGGAAATCCAAGCTGACAGTGACTGGTATGTCTCGCTCTTTGGCTCTGAAGGCAGACCTA 
TACCAGCTGGAAGGTTTGACAGATGTGGCACGTGAGAAATTATTGGAGGCCTTGACCTAC 
TCAAAGGATTCTCTCTTGATATTGGGTTTGGCAAAGTTGGATAGTGAGTTGGAAAATTAC 
CAAGCGGCTATTCAAGCCTATGCCCAGTTAGATAATCGCTCGATTTATGAGCAAACGGGC 
ATTTCCACCTATCAACGAATTGGCTTTGCCTATGCTCAGTTAGGGAAATTTGAAACGGCT 
ACTGAGTTTTTAGAAAAAGCCCTGGAGTTAGAATACGATGACTTAACAGCTTTTGAGTTG 
GCCAGTCTTTATTTTGATCAAGAAGAATATCAAAAAGCCACCCTCTACTTTAAGCAGCTT 
GATACCATTTCTCCTGACTTTGAAGGCTATGAGTATGGGTACAGTCAGGCTTTACATAAG 
G AAC ATC AAG T T C AAG AAG C C C TG C G TAT C G C T AAG C AAGG ATT AG AG AAAAATC C C T T T 
GAAACTCGCCTCTTGCTAGCTGCTTCACAATTTTCTTATGAATTGCATGATGCTAGTGGT 
GCAGAAAATTATCTCCTTACTGCAAAAGAAGACGCTGAGGATACAGAAGAAATCTTGCTT 
CGTTTAGCCACTATTTATCTGGAGCAGGAGCGTTATGAGGATATTCTAGACTTGCAGAGT 
GAGGAGCCAGAAAATCTTTTGACCAAGTGGATGATTGCTCGTTCTTATCAAGAAATGGAC 
GATTTGGATACTGCTTATGAGCATTATCAAGAGTTGACAGGAGATTTGAAGGACAATCCA 
GAATTTCTGGAACACTATATCTATCTCTTGCGTGAATTGGGACATTTTGAAGAAGCAAAA 
GTCCATGCTCACACTTACTTAAAACTGGTTCCAGATGATGTGCAAATGCAAGAACTGTTT 
G AG AG ATTG T AAG AATG TTT AAAC AT AT AG AAC T GT AGT TT AT C T C TT T TG AT AG C T AC G 
GTCTTTATTTGTACATGGTAGAATCTTTTTACAAAAATACTTGGTAATCTTGTTTATTCA 
TGCCATAATAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 687 1055 F 123 aa 

2 979 1932 F 318 aa 

> 3864548-2 ORF translation from 687-1055, direction F 
VRKSRVNNSQQMLQALEEQDLTKAEHYFAKALENDSSDLLYELATYLEGIGFYPQAKEIY 
LKIVEEFPEVHLNLAAMASEDGQIEKAFNYLEEIQADSDWYVSLFGSEGRPIPAGRFDRC 
GT* 

Description : 
unknown 
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> 3864548-3 ORF translation from 979-1932, direction F _ 
VTGMSRSLALKADLYQLEGLTDVAREKLLEALTYSKDSLLILGLAKLDSELENYQAAIQA 
YAQLDNRSIYEQTGISTYQRIGFAYAQLGKFETATEFLEKALELEYDDLTAFELASLYFD 
QEEYQKATLYFKQLDTISPDFEGYEYGYSQALHKEHQVQEALRIAKQGLEKNPFETRLLL 
AASQFSYELHDASGAENYLLTAKEDAEDTEEILLRLATIYLEQERYEDILDLQSEEPENL 
LTKWMIARSYQEMDDLDTAYEHYQELTGDLKDNPEFLEHYIYLLRELGHFEEAKVHAHTY 
LKLVPDDVQMQELFERL * 

Description : 
unknown 

Assembly ID: 3864582 
Assembly Length: 1318bp 

> 3864582 Strep Assembly -- Assembly id#3864582 

CTTTAGCAATCAGTTTATTGGGAGATTTGACTGCCACTTCTGTTGGAACCTTGATAATCT 
TTTTACCCTCAAAGCGTTCCATACCAGAAATCTTAACATCAACTGCTAAAATAACTACAT 
CCGCTGCATCAATCTGCTCTTGACTCAATTCATTTTCTACCCCTATTGTCCCCTGAGTCT 
CAACATGAATCACATGTCCAGCTACCTTTGCGGCATTCTCTAATTTTTCCTGTGCAATAT 
AAGTGTGGGCAATTCCCATAGTACAAGCTGCAACACCAACAATTTTCATACGGATACCCT 
CCAAAATTTTTTCTTATTAACAAAAAGCTGCAATCACATCATCAGATGTCTGAGCCCGAA 
CTAATTTGGCAACAACTTCGTCATTACCAAGTTTTCGAGCAAAGAGTGATAAGGTCTTCA 
AATGCTCCCTAGCAGCTTCTGTATCATCACCAACTGCAAAGAGTACAATTACTTTGACCC 
CTTTCCCATCAATGGTCTCCCAAGGAATCTCATTGTGATTTATAGCTATGACTACCCCCG 
CCTTCTCCACAGCAGAACTCTAGCTATGGGGAATAGCAATATAATTCCCAATACCGGTCT 
GTCCTTCTGCCTCTCTCTGATAAAGACCTTCGATAAATTGGTCTCTATCAGACACATAAC 
CCGTCTCAACCAATAGTATGAGCTAATGCCTCAAAAACCTCTTCTTTGCTCTGCATCTGT 
AAATCCGTCTGGATCAGACTCACATTAAGAATATCTTTGATTTCCATATATTATCTCCCG 
TAATTCTTCTTTTGTTAACTGTTTTAATTGATTTATGAATGATTCATCTGCTAGTCTTCT 
CATCAATGTTTTAATACATGACTTGTCCTGTGATACTGCAATGGCCAAACCGATAATAAG 
GTCAACACACTGGATATCCTTCGACCATTCTCTGATAGGTGGTTTTAATCTAGTAATCAC 
TAAGACATGATGTTGAAAGTTTCCTTCACAATGTGGTAGAAGAACACCTTTAGCAACCTC 
TATACTTCCCTGTCTCTCACGGTAATATAGAAGCTCTTCTATTTTTTCTGTATCTTCAGA 
AACAAGAAGGCTGATTTGATTTGCTAATTCTTTGTAGGCTTCTTGACGATTTTGAACAGA 
TATATCCATAAGGACAAGCGAAAGATTATTCATAGTTTATCTCCTGAATTTTTGCTTGAA 
GACGTTGTTTATCACCCTCGGTTAGAAAAGCACTAACTAGGACAAACGGGACACTTGCTG 
GTTCCTGCAAAGCTACCGTCGTCACAATGAAATCTAAATCTGGATATAGATTTATCAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 317 550 R 78 aa 
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> 3864582-1 ORF translation from 317-550, direction R _ 
VEKAGWIAINHNEIPWETIDGKGVKVIVLFAVGDDTEAAREHLKTLSLFARKLGNDEW 
AKLVRAQTSDDVI AAFC * 

Description : 

Probable phosphotransferase enzyme Ila component 

Assembly ID: 3864604 
Assembly Length: 2 077bp 

> 3864604 Strep Assembly -- Assembly id#3864604 

CTAGTCTTGGCTACTGTCTAAGTTGGCTTGTGCATAAGCCTGCCAGATTTTTTGTTGGGG 
TTTGGCAAGTGGGTAATTCTTGAATTCTTCTGGTGAAAGCCAACGAACTTCCCTATCTGA 
AAAATCATGGAAGTCACTCACCTGACCTGCTACAATCTGTACATGCCATTTTCGATGACT 
AAAAACATGCTGGACTGTATCAAAACAAACATCAAGCCAATCAACATCTAGGTCATAGTC 
CTGCTGGAAACTCTCTTCTGGGACTGGGGCCAGAGTTCACACTTTCTTCCGCAACCTGAT 
GAAAGAGGTCAAACTGCTCTTCTTGCGAAAAGTTATCAACTTCTATAAAGGGGAAATGCC 
AAAAACCTGCCAAGAGCTTTTCGCTTTCATTTTTTTCAAGTAAAAATTGTCCTTGAGAAT 
TTTTCACAACTAAGGCTTTAAGATAAATAGGAACCGGCTTTTTCTTAGGAGATTTAATTG 
GATAACGGTCCATGGTTCCATTCTGATATGCCGCACTAAAGTCCTTGACTGGGCTTTCTT 
CAGGTCTGGGATTTACAGGAGACTCAATATCAGACCCTAAGTCCATCAAGGCTTGATTAA 
AATCACCCGGACGATCTGGATTAATCAAGATCTCCATCATTGCCTGAAAAATTTTTCGAT 
T AC TTG G AAT C C C AAT AT C GTGG T TG AC T T C AAAC AG AC G C G C C AAG AC C C G C ATG AC AT 
TACCATCTACAGCTGGCTCAGGCAAGTTAAAAGCAATACTGGAAATGGCTCCTGCTGTGT 
AAGGTCCAATCCCTTTCAAGCTGGAAATTCCTTCATAGGTATTTGGAAATTGGCCACCAA 
AGTCAGTCATAATCTGCTGGGCTGCAGCCTGCATATTGCGAACTCGAGAATAATAACCCA 
AGCCCTCCCAAGCTTTCAGTAAACTCTCCTCAGGCGCAGTTGCCAGACTTTCGACAGTTG 
GAAACCAGTCCAAAAATCTTTCGTAGTAAGGGATAACTGTATCCACCCTGGTCTGCTGAA 
GCATGATTTCAGATACCCAGATGTGATAAGGATTTTTACTTCTCCTCCAAGGCAAATCTC 
TTTTGTTTTCATCATACCAAGCGAGAAGTTTTCTCACCGGAAAGAAATGACTTTCTCCTC 
CGGCCACATGACGATACCGTATTCTTTCAAATCCTAACATATCTCTAGTTATAACACAGA 
AGGTTTCACCTGTCTTTGTATCTGATTTATAATATTTTCAATAGATAGTATATAACTTTT 
CCTATCTACTTATACTCCAATGAAAATCCAAAGAGCAAACTAAGAAGCTAGCCGCAGGTT 
GCTCAAAACACTGTTTTGAGGTTGTGGATAGAACTGACAGAGTCAGTATCATATTACCTA 
CGGCAAGGTGAAGCTGACGTAGTTTGAAAAGATTTTCGAAGAGTATAAATCTTATTGATG 
AACTGCTTGCAGTCTGAGAAAAAATGAGCTTGGATATTATTTCCAAACTCACTTAAAGTC 
AATTTCAATCCACTAGAACAAGCCTAGTACAGTTCCATCGCTTTCAACATCCATGTTGAG 
AGCTGCTGGACGTTTTGGAAGACCTGGCATGGTCATAACATCACCAGTTAAGGCAACGAT 
GAAGCCTGCACCTAATTTTGGTACCAATTCACGAATGGTAATTTCAAAGTTTTCTGGTGC 
TCCAAGCGCATTTGGATTGTCTGAGAAACTGTATTGAGTTTTAGCCATACAAATTGGCAA 
TTTGTCCCAACCGTTTTGAACGATTTGAGCAATTTGTGTTTGAGCTTTCTTCTCAAAGTT 
CACTTTGCTACCACGATAGATTTCAGTGACAATTTTTTCAATCTTTTCTTGGACAGAAAG 
GTCATTATCGTACAAACGTTTATAGTTAGCTGGATTTTCAGCAATTGTCTTAACAACTGT 
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TTCGGCAAGTGCTACTCCACCTTCTGCTCCATCAGCCCAGACACTAGCCAATTCAAQXGG 
TACATCGATTGAGGCACAGAGTTCTTTTAAGGCTGCAATTTCAGCTTCTGTATCAGATAC 
AAATTCGTTAATAGATACAAGCTAATGGAATACCGAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 1 141 R 47 aa 

2 1513 1803 R 97 aa 

> 3864604-1 ORF translation from 1-141, direction R 
VSDFHDFSDREVRWLSPEEFKNYPLAKPQQKIWQAYAQANLDSSQD* 

Description : 
unknown 

> 3864604-3 ORF translation from 1513-1803, direction R 
VNFEKKAQTQIAQIVQNGWDKLPICMAKTQYSFSDNPNALGAPENFEITIRELVPKLGAG 
FIVALTGDVMTMPGLPKRPAALNMDVESDGTVLGLF* 

Description : 

FORMATE --TETRAHYDROFOL ATE LIGASE (EC 6.3.4.3) 
{ FORMYLTETRAHYDROFOLATE SYNTHETAS E) (FHS) (FTHFS) . - 
CLOSTRIDIUM ACIDI-URICI . 

Assembly ID: 3864610 
Assembly Length: 1887bp 

> 3864610 Strep Assembly -- Assembly id#3864610 

CTCAAAACNCTGCTTTGAAGAGATTTTCAAAGAGTACAAGAAGTTTAGTTATTAGCGTTC 
TTACCGCTTGTAAACTAGATTTCTCATAAAATAGAATCTTTTCCTTTTAGTTGTAAACTA 
GTCTGGGAGAGTAGAGAGGTTTGAGATACCTTTCTAGCTTTTGGATTATCATCTAAGAAG 
AGTAATTTCCCTTGCATTAAAAAGGGGAAAAAGAGACACGAAATGACTATAATGGGTGAC 
AATGGGGGAAGGGATAGACAAGAGATTTTATCCACATATGAAAAAAGGAGGTTAGGAAAG 
AG TT AT AT AT C C TAT AT T AT AT AAAT AAT C AATTG C GC AG AAATT TGG T AAG AAT T C ATG 
CGTCAACTCATAAAGAACTACTTAAAAAATTCACAGTATTCATAATTATTTTCGAGGAGA 
AAAACAGTGAAAAAAAGAAAAAAGCTTGCTCTGTCTCTTATCGCTTTTTGGCTGACGGCT 
TGTTTAGTAGGCTGTGCTAGCTGGATTGATCGTGGAGAATCCATAACGGCTGTTGGCTCA 
ACTGCCTTGCAACCCTTGGTTGAAGTAGCGGCAGATGAATTTGGCACCATCCATGTTGGA 
AAAACGGTCAATGTCCAAGGGGGAAGTTCTGGTACAGGCTTGTCCCAGGTTCAGTCTGGG 
GCAGTTGATATAGGAAACTCAGATGTATTTGCTGAGGAAAAAGACGGAATTGATGCTTCT 
GCTCTTGTTGACCACAAGGTCGCGGTAGCTGGCTTGGCTCTGATTGTCAATAAGGAGGTT 
GATGTTGATAACCTAACGACAGAGCAACTTCGTCAAATCTTCATAGGTGAGGTAACCAAT 
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TGGAAAGAGGTTGGTGGTAAGGACTTACCCATCTCTGTTATCAATCGGGCAGCCGGCXCT 
GGCTCTCGTGCTACCTTTGATACTGTCATTATGGAAGGTCAGTCTGCCATGCAAAGTCAG 
GAGCAGGATTCAAATGGAGCGGTAAAATCAATCGTATCAAAAAGTCCAGGAGCTATCTCT 
TATTTATCTCTTACCTATATAGATGATTCGGTCAAAAGCATGAAGTTGAATGGCTATGAC 
TTAAGTCCAGAAAATATAAGTAGCAATAATTGGCCCTTGTGGTCTTATGAGCATATGTAT 
ACATTGGGGCAGCCCAATGAGTTGGCTGCAGAATTTCTCAATTTTGTTCTCTCGGATGAG 
ACCCAAGAAGGGATTGTCAAAGGATTGAAGTATATTCCGATTAAGGAAATGAAGGTTGAA 
AAAGATGCTGCCGGAACTGTGACAGTGTTGGAAGGGAGACAATAATGAATCAAGAAGAAT 
TAGCTAAGAAAATGTTGCTTCCATCAAAGAATTCTCGTCTGGAGAAATTAGGAAAAGGTT 
TGACCTTTGCCTGTCTTTCTTTGATAGTCATCCTTGTGGCCATGATTTTGGTTTTCGTAG 
CGCAAAAAGGCTTGTCGACCTTCTTTGTCAATGGTGTGAATATCTTTGACTTTCTTTTGG 
GAGGAACTTGGAATCCTTCTAGTAAAGAATTTGGTGCCCTTCCTATGATTTTGGGTTCCT 
TTATCGTTACCATTCTCTCAGCCCTTATCGCAACACCCTTTGCTATTGGTGCAGCAGTTT 
TTATGACCGAAGTATCACCAAAAGGGGCGAAGATTTTGCAACCAGCTATTGAACTCCTGG 
TTGGGATTCCTTCAGTAGTGTACGGATTTATTGGCTTGCAAGTCGTCGTTCCCTTTGTTC 
GCAGTGTCTTTGGTGGGACTGGTTTTGGGATTTTGTCAGGGATTTCCGTCCTCTTTGTCA 
TGATTTTGCCGACCGTAACCTTTATGACAACGGATAGCTTGCGTGCGGTTCCTCCNTTAT 
TATCGTGAAGCCAGTTTCGCTATGGGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 427 1305 F 293 aa 

> 3864610-1 ORF translation from 427-1305, direction F 
VKKRKKL AL S L IAFWLTAC LVGCASWI DRGE S I TAVG S T AL Q PL VE VAADEFGT I HVGKT 
VNVQGGSSGTGLSQVQSGAVDIGNSDVFAEEKDGIDASALVDHKVAVAGLALIVNKEVDV 
DNLTTEQLRQIFIGEVTNWKEVGGKDLPISVINRAAGSGSRATFDTVIMEGQSAMQSQEQ 
DSNGAVKSIVSKSPGAISYLSLTYIDDSVKSMKLNGYDLSPENISSNNWPLWSYEHMYTL 
GQPNELAAEFLNFVLSDETQEGIVKGLKYIPIKEMKVEKDAAGTVTVLEGRQ* 

Description : 

PROBABLE ABC TRANSPORTER BINDING PROTEIN PRECURSOR (ORF10 8) . - 
BACILLUS SUBTILIS. (BLAST) 

Assembly ID: 3864716 
Assembly Length: 4 0 5bp 

> 3864716 Strep Assembly -- Assembly id#3864716 

CTGAGGAATCAAAAGTTGAACCACCAGTAGAACAAGCATAAGTCCCAGAACAACCCGTGC 
AACCTACACAAGCTGAGCAACCAAGTACACCAAAAGAATCATCACAACAAGAAAATCCTA 
AAG AAGAT AGGGGAG CGG AAG AG AC TC C G AAAC AAG AAG ATG AAC AGC C AGC AG AAG C C C 
AAGAAATCAAGGTTGAAGAACCAGTAGAATCTATAGAGGAGACTGTCATTCAACCTGTTG 
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AACAACCAAAAGTGGAAACGCCTGCTGTTTAATAACTAACGGAACCTACAGAGGAACCTA 
AAGTTGAAGTAACTAGTATTCCCCTCACTACTCGCTATGAGGAAGACCTTACTTACGAAC 
ACGGAACGCGTTGAAGTTGTTAAGGAAGGTTATAATTGGCAGTAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 57 272 F 72 aa 

> 3864716-1 ORF translation from 57-272, direction F 
VQPTQAEQPSTPKESSQQENPKEDRGAEETPKQEDEQPAEAQEIKVEEPVESIEETVIQP 
VEQPKVETPAV* 

Description : 
unknown 

Assembly ID: 3864718 
Assembly Length: 1542bp 

> 3864718 Strep Assembly -- Assembly id#3864718 

CTATGGGATTGGTAGTTCTTCCTAGTGCAGGGGCTGTAGACCCAGTTGCGACCCTAGCGC 
TGGACTAGTCGAGAGGGTGTTGTTGAAAATGGATGGCTATCGCTATGTTGGTTATCTATC 
AGGTGACATCCTCAAAACGCTTGGCTTGGACACTGTTTTAGAAGAAACCTCAGCAAAACC 
TGGAGAGGTGACTGTAGTCGAAGTTGAGACTCCTCAATCAACAACAAATCAGGAGCAAGC 
TAGGACAGAAAACCAAGTAGTAGAGACAGAGGAAGCTCCAAAAGAAGAAGCACCTAAAAC 
AG AAG AAAGTC C AAAGG AAG AAC C AAAATC GG AGGT AAAAC C T AC TG AC G AC AC C C TTC C 
TAAAGTAGAAGAGGGGAAAGAAGATTCAGCAGAACCATCTCCAGTTGAAGAAGTAGGTGG 
AGAAGTTGAGTCAAAACCAGAGGAAAAAGTAGCAGTTAAGCCAGAAAGTCAACCATCAGA 
CAAACCAGCTGAGGAATCAAAAGTTGAACCACCAGTAGAACAAGCAAAAGTCCCAGAACA 
ACCCGTGCAACCTACACAAGCTGAGCAACCAAGTACACCAAAAGAATCATCACAACAAGA 
AAATCCTAAAGAAGATAGGGGAGCGGAAGAGACACCGAAACAAGAAGATGAACAGCCAGC 
AGAAGCCCAAGAAATCAAGGTTGAAGAACCAGTAGAATCAAAAGAGGAGACTGTTAATCA 
ACCTGTTGAACAACCAAAAGTGGAAACGCCTGCTGTAGAAAAACAAACGGAACCAACAGA 
GGAACCAAAAGTTGAAGTAACAAGTATTCCCCAAACTACTCGCTATGAGGAAGACCTTAC 
TAAGGAACACGGAACGCGTGAAGTTGTTAAGGAAGGTAAGAATGGCAGTAGAACAGTTAC 
TACTCCATATATCTTGAATGCGACAGATGGTACGACTACAGAAGGCACTTCGACAACTGA 
TGAAGCTGAGATGGAGAAAGAGGTTGTTCGTGTTGGCACGAAACCCAAAGAAAAATTAGC 
TCCAGTCTTAAGTTTGACAAGTGTTACAGATAATGCAATGTTGCGTAGTGCGAGACTTAC 
TTATCATTTGGAAAATACAGATAGTGTTGATGTGAAAAAAATTCATGCTGAAATTAAAAA 
TGGCGATAAGGTTGTCAAAACTATTGACTTATCTAAAGAGAGATTATCAGATGCTGTTGA 
CGGTCTTGAACTTTATAAAGATTATAAGATTGTGACGAGTATGACCTATGATAGAGGTAA 
TGGTGAAGAAACCTCTACGTTGGAAGAAACTCCACTACGATTAGACCTCAAGAAGGTTGA 
ATTGAAAAACATCGGCTCTACTAATCTCGTCAAAGTAAATGAGGATGGTACTGAGGTGGC 
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AAGTGACTTCTTAACAAGTAAACCTGTGGATGTGCAGAATTACTACCTCAAAGTAACXTC 
CCGTGATAATAAAGTTGTTTCCCCTCCCAGTTGAAAAAATTGAAGAGGTGACTGAGGAAG 
GTCCACCACTTTACAAAGTCCCTGCTAAGGCCCTAATTTGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 77 1474 F 466 aa 

> 3864718-1 ORF translation from 77-1474, direction F 
VLLKMDGYRYVGYLSGDILKTLGLDTVLEETSAKPGEVTWEVETPQSTTNQEQARTENQ 
WETEEAPKEEAPKTEESPKEEPKSEVKPTDDTLPKVEEGKEDSAEPSPVEEVGGEVESK 
PEEKVAVKPESQPSDKPAEESKVEPPVEQAKVPEQPVQPTQAEQPSTPKESSQQENPKED 
RGAEETPKQEDEQPAEAQEIKVEEPVESKEETVNQPVEQPKVETPAVEKQTEPTEEPKVE 
VTSIPQTTRYEEDLTKEHGTREWKEGKNGSRTVTTPYILNATDGTTTEGTSTTDEAEME 
KEWRVGTKPKEKLAPVL S L T S VTDNAMLR S ARLT YHLENTD S VDVKK I HAE I KNGDKW 
KTIDLSKERLSDAVDGLELYKDYKIVTSMTYDRGNGEETSTLEETPLRLDLKKVELKNIG 
STNLVKVNEDGTEVASDFLTSKPVDVQNYYLKVTSRDNKWSPPS* 

Description : 
unknown 

Assembly ID: 3864802 
Assembly Length: 13 21bp 

> 3864802 Strep Assembly -- Assembly id#3864802 

ATCGAATTACTTCAACTCCAACTTTACTCTCAATAAAAATCAAATGTAAAAAGAGGAGCT 
AAATTTATCTTTTTCTCCTCCTTCATCGTTCTTACTTTTGACCATAATAAGCATTTGGTC 
CATGTTTACGTTGGTAGTGTTTTTCTAGTATGTACTGGGGAGCAGGTTCAACTCTTGGAT 
TGATTTGTTCTGTAAAGCGATTCATCTTTGATACTTCCTCTAGTACGACAGAGTGATAAA 
CAGCATTCTCTGGATTTTTGCCCCAGGTGAATGGACCGTGATTGCGTACAACAATTCCTG 
GTACTTCAACCGGGTTAAGTCCGCGATGTTCAAACTCTTCTACGATAACCAGGCCAGTAT 
CTTTTTCATAGGCCACTTCTACTTCGTCCTTGGTCAAACTACGGGCGCAAGGGATTGAAC 
CGTAGAAATAATCTGCATGGGTTGTTCCGTAGAAAGGAATATCACGACCTGCCTGAGCCC 
AAGCAACAGCTTCTGTCGAATGGGTGTGAACCACACTACCAATTTCTGACCAAGCCTTAT 
ATAATTGCACATGAGTTGGGAAGTCGGAAGATGGTCTTAAATCCCCTTATAGGATCTTAC 
CATCTAGATCAGTCACTACCATGTTTTCAGGTGTCAATTCGTCATAATCCACGCCTGATG 
GTTTGATAACAATGACACCGAGTTCGCGATTGACTTCAGATACATTCCCCCAGGTAAATT 
TGACAAGTCCATGTTTTGGCAATGATTGATTGGCATCACAGACTCGTTTACGCATAGCAT 
TGATTACTTGATTCATCTTACATCAAACCTGCTTTCTTAATGAGTGGATAGAGAAAAGCT 
TGCGCCTCTTGAATGGCTGCGCGTGTTTCTTCTACTGTTTCACAATTTTCAGACCACATT 
TCGATTAGGAAAGGTCCATTATAATTGGTTTCCTTTAAAATATCGAAAGCTTCTTCCCAT 
TTGACACAACCTTGCCCAAAAGGTACATCTCGGAACTGGCCCTTTGAACTTTCTGTCACT 
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G C AT AAG TAT C C T TG AG ATGG AG AG T TG C GAT G G C AT GAT G AC C AAG AT AAAAC T C ACT A 
TAGATATCATTATGCCATGCAGACACATTACCAATATCTGGATATACAAAGAGGAAGGGA 
GAGTCAATCTCTTTTTCTATAGCCAAATATTTTTCGATGCTATTGATGAAAGGATCATCC 
ATAATTTCAATAGCAAGTACCACCTGAGCTTCTTCAGCCCAGTCACAGGCTTTTCTCAAA 
TTTTTGATAAAACGTTGGCGTGTCTGGGGTGACTTTTCCTCATAGTAAACATCGTAACCA 
G 

ORF Predictions: 

ORF # Start End Direction Length 



1 92 550 R 153 aa 

> 3864802-1 ORF translation from 92-550, direction R 
VQLYKAWSE I G SWHTH S TE AVAWAQ AGRD I P F YGTTH AD YF YG S I PC AR S LTKDEVEVA 
YEKDTGLVIVEEFEHRGLNPVEVPGIWRNHGPFTWGKNPENAVYHSWLEEVSKMNRFT 
EQINPRVEPAPQYILEKHYQRKHGPNAYYGQK* 

Description : 

L-RIBULOSE-5-PHOSPHATE 4 -EPIMERASE (EC 5.1.3.4). - ESCHERICHIA 
COLI . 

Assembly ID: 3864854 
Assembly Length: 12 65bp 

> 3864854 Strep Assembly Assembly id#3864854 

TTTTTCTGTTTTTCGGAGCAAACTGGGCTCCAGCCGGTTTTGGCCTTCTTTCCTTAGCTA 
CAGCTGGTTTAGCTGGCTCAGATTTTTCGGCTTTCTTTTCTGCACTTACTTTTGGTGCTG 
CAGGTTTTGCTTCTACTTTCGGAGCAGCTGCAGGCTTAAAGCTGGCAGCAATTTTTGCAG 
CGACAGCTTCTTCCACACTTGATGAGTGGCTTTTCACATCCAAGCCCAACTCTTTTGCAC 
GCGCTACAACTTCTTTACTTTCTTTTCCAAGTTCTTTTGCGATTTCGTACAATCTTTTCT 
TAGACAAATCATGTCCTCCTCTTCTATTCCATAAGAGACCTCATTTTCTTTGTAAATCCA 
GCATCTGTTACAGCCAAAACCTTTCTCGATTTCCCGACTGCTATGATTAATTCCAGTGTT 
GAAAACACGGTTACAATTTCTACTTGATAATAATGACTTTTATCTTGAATCTTCTTGGTC 
AGATTGGGTCCAGCATCATGAGCTAGAAAGACCAACTTGGCCTTGCCGTCTTGAATGGCC 
TTGACCACCAATTCTTCACCCGATATGATGCGCCCTGCTCGCTGAGCAAGCCCCAAGAGA 
TTACTTATCTTTTGCTTATTCAAGTCCCAACTCTCTTCTTTTCACTTTGTGATCCACATA 
AGCGATCAACTCGTCATAAAAGCTTTCTTCCACTTCCATGCTAAAGCTGCGGTTAAAGAC 
CTTCTTCTTTTTCGCCTCTAGGGCTTCTGCATTGTCTAGTTTGATATAAGCGCCGCGGCC 
ATTGGCCTTGCCCGTAGGATCAATAAAGACTTGTCCTTCCTTGTTCTTGACAATGCGGAG 
CAAATCACGCTTATCAATCACTTCGTTAGACACAACAGACTTGCGCAAAGGGATTTTTCT 
TGTTTTCATCTTTCCCTCCTCTAGCAGCTTTTATTCTTCTACAGTATCGTTTTCTACTTC 
CAACTCTACTGAAGCAGCGTCTTCCATGGCTTCAAATTCGCTAGCAGACTTGATATCGAT 
ACGGTAACCAGTCAAGTGAGCCGCCAAGCGCACGTTTTGTCCACGACGACCAATGGCAAG 
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AGAAAGCTTGTTATCTGGAACAACCACCAAGGCACGTTTGCTGTCGTTTTCATCAAAG^T 
AACTTGGTCAACCTCAGCAGGAGCGATGGCATTGTAGATAAATTCAGCTGGATCTGCTAC 
CCACTCGATAACATCGATATTTTCTTCGATTGGTACCATGCGGTCATTTTTAGCATCGTA 
ACGAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 324 548 R 75 aa 

> 3864854-1 ORF translation from 324-548, direction R 
WKAIQDGKAKLVFLAHDAGPNLTKKIQDKSHYYQVEIVTVFSTLELIIAVGKSRKVLAV 
TD AG F TKKMR S LME * 

Description : 

PROBABLE 11.1 KD RIBOSOMAL PROTEIN IN NUSA-INFB INTERGENIC 
REGION (0RF4). -BACILLUS SUBTILIS. 

Assembly ID: 3864862 
Assembly Length: 13 05bp 

> 3864862 Strep Assembly -- Assembly id#3864862 

ATAAACCAAAGGAAGCTGAGCTCTTTAGTCCCAGCTTCTTTTTATATATAAAATTTTACC 
CGTGAAAAGACAGGGCCTTAGCAGACTTCTTTTTTACTTCGTTCACCCTTGCTTTTTCTT 
TGTATGTTTGGGCGTTGGCAGTTGGTTATACATAGCTAAAATCAGGTCTTATAGAAACAT 
CTTATTATCAAGTTCTTCCACTCAAATCATTTCTTTGGCACCTTTGTATGGAAACTCAAA 
AGAAGATTGGTCAATCTTATCTAAGACTGCTTGCACGGGTTTAACTAAAAGCGATCGTCA 
TAAATGCCGCCAATAATCTTGCCGCGGAAGTAAAGAATATACTCCCCCATCATGGAACGG 
TAAGTCACATCATCTAATCCTGATAATTGTTCCAAAACAAATTCCAAATAGTTCTTACTT 
GATGCCATTTCTAATCTTCTAGGCTCTGTTCAACGATAACAACCGTATAGAGTTCTTGCT 
TAACCTCGCATCCAATTGATTTAAAGCCCTGCTTTTCCCAAAAATGCTGAGATTGCGGAT 
TTCCCTTAACATAAGCCAAACGTGCCTTTCGAAAGTTCTTAGCAAAATAAGCTAGTGCTT 
CTGTCACAATATGACTACCAATCCCTTTCCTCTGATAGGCTTGATCAACCATAAACAAAC 
CAATAAAAACAGTCTCCTCATCAGGATATGCATAGACAAAATCCATAACAGCCACAAGGT 
CAAATCCATTCCAAAATCCAACAAAAAACTTATCAGCCTTAGCTTTACCTTCAGGTAGAC 
AAAGCATGTCCTCTTTTACAGTTGCAAAATTTGGCTCTGGTGGACAATGCTGAAAATACA 
GAGGATTACTTTCATATAAAGATAAAATACTTGGAATATCCTTTTCAGTTAGTATCCTAC 
AACTGTAATACTTAGATAGTTGGTCAATCATCTTTTCAAATTCGATACTTTCTTGTGCCC 
TGTGATTATGACACAGGAAGATGCACTGATCGTCATCAGCCACATAAAAGTTCTTTCCAT 
CGTGCCTAATCGTTGTCTCAAACCTTTGGATAAAACCTTTAGCCTATACAACTGGATTTT 
CCTCTCTCAAAAGTATATTCTTTTGCAGGCGAACTTCCTCAAAATCAGTCGTGTGCAACT 
TCAGTAGAATATTCATAGGCTCGGATAATCTGAGCGACAACAGGATGGCGAACCACATCC 
TTGGCTGAAAAATGAACAAAGTCAATCTGATGGATGTTCTTGAGTTTCT.CTTGAGCATCA 
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ATCAAACCGGACTTGACATTACGTGGCAGGTCAATCTGACTAATA 
ORF Predictions: 

ORF # Start End ... Direction Length 



1 431 1003 R 191 aa 

> 3864862-1 ORF translation from 431-1003, direction R 
VADDDQCIFLCHNHRAQESIEFEKMIDQLSKYYSCRILTEKDIPSILSLYESNPLYFQHC 
PPEPNFATVKEDMLCLPEGKAKADKFFVGFWNGFDLVAVMDFVYAYPDEETVFIGLFMVD 
QAYQRKGIGSHIVTEALAYFAKNFRKARLAYVKGNPQSQHFWEKQGFKSIGCEVKQELYT 
WIVEQSLED* 

Description : 
unknown 

Assembly ID: 3864888 
Assembly Length: 17 42bp 

> 3864888 Strep Assembly -- Assembly id#3864888 

CTAATCTCCTTAAAACGTGATCTTTTCAAGAATATTTTTATCTAAACAATCCAGCAAGTC 
TTGGTAAGAATAGACTTCGTAAGTCGGCTGGGCTTGTGTGTGATTTTCGAGGTGATGAGG 
ATTATACCAGATAGTGTCAATCCCCGCATTATTGCCACCTTGAATGTCGGCGGTTAGAGA 
ATCTCCAATCATCAGCGTCTTTTCTTTACTAAATCCAGCAATTTGCTGGCCAATCTTTTC 
ATAAAAAAGAGCATCCGGCTTTTGAGTTTGCAACTGTTCTGAGATAAAGACTTGATTGAA 
ATAAGGTGCTAGACCAGATTGAGCCAAACGTCCTGTCTGAATGGCAGTAATGCCATTTGT 
CGCAGCATACAAGTTATAATCACGCTCAATGAGGCTGTCCAAGAGATCATGAGCGCCCGA 
TAGTGTTTGTCCCTGCTGGGCGAGGTAAAATTGGTAACGCTGGGCAAGAAAACTACCGTC 
TTTTTCCTGTCCAAAATGAGCAAATAAACGAGAAAAGCGCGTGTTAACCAGCTCTTGTTT 
ACTGATTTTCTTCAGCTCCAAGTCTTTCCAGAGAGCCTTGTTCATAGGAACGTAATAATC 
TTTATAAGCCGGAATATCCGCAACTCCTTCTTCTTTTAGAAGTGGAGTCAAAGCCACATC 
CTCAGCAGCATCAAAATCAAGAAGAGTGTGGTCGAGGTCGAAGAGTACAAATTTGTAGAA 
CAATTTGAGGTTTTCCTTTCTGAAAATTCATTAAGAACATTATATCATAAAGCACCTCAT 
ACAATTAACTAATTTAATCACTTAAAAAAAATTCGAACACTTTCTATACAACTGACAGCT 
CAAATCTTTCAGAATAGAACAATACTAACTATCGAACACCCCGTCTTCATAAATACATAT 
GTAATTCTAGGCCTAGAATTCCTATAAACTAAATGCTTTCATACTCTTCCAAGTAATTGA 
TTGCCTTAAATTTTAATTTTTGAAGGTTTCTAAAGCTAGAATAGCCCCATCACAATCAGT 
TTTGATTGATTCACAATTTAGAAACACTATAGTTTCACTCCTGTTAAAATAAAAAGGAAC 
TGCATAAAGCAATCCCTTTCTGATTTTGAAATCATTTACTTAACATTTTATAGTTGAGAT 
AATCAATAGCTTATCTATAAAAAGAGTTATAGTAAAATTCCTTATTTATTGATTCCAAGC 
TCCGCTAACTGTATTTGAATAACTGACAGTTCTGCACCAGCCTGAAAAAGAGCAGCTGCA 
TTATAGGCACCTTCTACAATTGGAACCCTGTTGATGATGATACTTTTATCACTGAAATCA 
GTC AC C ATTTTT AAGTTC ATTTT AGC AGAAC C TAGGTC AAAAAAGGC AAGT AAAGTATC T 
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GCTGGATTTTCGGAAACAACCCTATCTACTTGATCAAAACTCGTTCCAATTCCTCCGC.ee 
TCGGTTCCTCCTACATAAGTAATCGGAACATCTTTAGCTACTTTACTAATCAGTTCAACA 
ACACCTTCTGCAATGTGTTTGGAATGTGAAACGATAACAAGACCAATACCAATACTTTCC 
ATCAAACCACTCCAGTTTCTAAAATAGCAGTAAAGAGTAATCCTGATGAGAATGATCCAG 
GATCAATATGTCCAAGAAACCACATGCTCCTAAGACAAGAGCTAACAGACTGGCCATCAA 
TAATAGTATTGTTCTTTTTTTCATCATTACTCCTTAACTAGTGTTTAACTGATTAATTCG 
AT 

ORF Predictions: 

ORF # Start End Direction Length 



1 10 657 R 216 aa 

> 3864888-1 ORF translation from 10-657, direction R 
VALTPLLKEEGVADIPAYKDYYVPMNKALWKDLELKKISKQELV1SITRFSRLFAHFGQEKD 
G S FL AQRYQF YL AQQGQTL SGAHDLLD S L» I ERDYNL YAATNG I T A I QTGRL AQ S GL APYF 
NQVFISEQLQTQKPDALFYEKIGQQIAGFSKEKTLMIGDSLTADIQGGNNAGIDTIWYNP 
HHLENHTQAQPTYEWSYQDLLDCLDKNILEKITF* 

Description : 
unknown 

Assembly ID: 3864898 
Assembly Length: 113 6bp 

> 3864898 Strep Assembly -- Assembly id#3864898 

GTGGAATGCGGGGACGCCTTGTCTAATTTTGGATCAAGCCCTGAGTTTGACACAGGGAAA 
TGAGCTGGACGGACTGCTATCTCTGAAGAAATTACTGGCACCATTAGCCTATCAGCCTTG 
GATGATTATGTGGCGGCCTTGTCTCAACAGGATGTTCCCAAAGCTTTGTCTTGCTTGAAT 
CTTCTTTTTGACAATGGTAAGAGCATGACTCGTTTTGTGACCGATCTTTTGCACTATTTA 
AGAGACTTGTTAATTGTTCAAACAGGGGGAGAAAATACTCATCATAGTTCAGTCTTTGTA 
GAAAATTTGGCACTTCCTCAAAAAAATCTGTTTGAAATGATTCGCTTAGCAACAGTGAAT 
TTAGCAGATATTAAGTCTAGTTTGCAGCCCAAGATTTATGCTGAAATGATGACCGTCCGT 
TTGGCGGAAATCAAGCCCGAACCAGCTCTATCAGGAGCGGTTGAAAATCGAATTGCTACG 
CTGAGACAGGAAGTTGCCCGTCTCAAACAAGAGCTTTCTAATGCAGGTGCGGTTCCTAAA 
CAAGTTGCACCAGCTCCTAGTCGACCAGCTACGGGCAAAACAGTCTATCGTGTCGATCGC 
AATAAAGTGCAATCTATCTTACAAGAGGCCGTCGAAAATCCTGATTTAGCACGTCAAAAT 
CTAATTCGTTTGCAGAATGCCTGGGGAGAGGTAATTGAAAGTCTAGGTGGGCCGGACAAG 
GCTCTGCTAGTTGGTTCTCAACCGGTTGCTGCCAATGAACACCATGCTATTCTTGCTTTT 
GAGTCTAACTTCAATGCTGGTCAAACTATGAAACGAGACAATCTCAATACCATGTTTGGT 
AATATCCTCAGTCAGGCGGCAGGTTTTTCACCTGAGATTTTAGCTATTTCCATGGAGGAA 
TGGAAAGAAGTTCGCGCAGCCTTTTCAGCCAAAGCCAAATCTTCTCAAACTGAAAAAGAA 
GTAGAAGAAAGCCTGATTCCAGAAGGATTTGAATTTTTGGCTGATAAAGTGAAGGTAGAG 
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G AAG AC T AAAG AAAG AT T T C AT G AT AC AAT AAG T T T ATG AAT AAAC AAC AAT T T ATT A^T T 
ATGGCGCTATTTACAGCTGCTGAGACCTATTTTTTCAATGAAGCCTGGATGACTGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 130 1029 F 300 aa 

> 3864898-1 ORF translation from 130-1029, direction F 
VAALSQQDVPKALSCLNLLFDNGKSMTRFVTDLLHYLRDLLIVQTGGENTHHSSVFVENL 
ALPQKNLFEMIRLATVNLADIKSSLQPKIYAEMMTVRLAEIKPEPALSGAVENRIATLRQ 
EVARLKQELSNAGAVPKQVAPAPSRPATGKTVYRVDRNKVQSILQEAVENPDLARQNLIR 
LQNAWGEVIESLGGPDKALLVGSQPVAANEHHAILAFESNFNAGQTMKRDNLNTMFGNIL 
SQAAGFSPEILAISMEEWKEVRAAFSAKAKSSQTEKEVEESLIPEGFEFLADKVKVEED* 

Description : 
unknown 

Assembly ID: 3864938 
Assembly Length: 167 Obp 

> 3864938 Strep Assembly -- Assembly id#3864938 

CTGTCTCTGAAACAGTCACATCAAGTGCCTCTGAACAANCGCCCCNCCTAGGTNGACGGT 
ATCGATAAGCTCGATCTGTGATTTCAGAGAAGAAATCAAGTGCTGTAACAGAAGTAAGAT 
GTAATTGTATGTAAAGGAGACGTCATGTTAAATAGTATTGTAACCATTATTTGTATTGCC 
CTTATCGCGTTTATCTTGTTTTGGTTTTTCAAAAAGCCTGAAAAATCTGGACAAAAAGCC 
CAGCAAAAAAACGGATACCAAGAGATTCGAGTGGAAGTCATGGGAGGCTATACTCCTGAG 
TTGATTGTCCTCAAGAAATCAGTGCCAGCCCGCATTGTCTTTGACCGCAAGGATCCTTCA 
CCATGTCTGGATCAAATTGTTTTTCCAGATTTTGGTGTACATGCGAACCTGCCAATGGGG 
GAAGAGTATGTAGTGGAAATCACGCCTGAACAGGCTGGAGAGTTTGGCTTTGCTTGTGGT 
ATGAACATGATGCACGGCAAGATGATTGTAGAGTAGGTGGAGACTATGACAGAAATTGTG 
AAAGCAAGCTTAGAAAATGGCATTCAAAAAATCCGTATCCGAGCTGAAAAAGGCTATCAT 
CCAGCCCATATCCAGCTTCAAAAGGGAATTCCAGCTGAGATTACCTTTCATTCGTGCTAC 
TCCTTCAAACTGTTATAAGGGAAATTCTGTTTGAAGAAGAAGGTATCTTGGAAGCAATCG 
GCGTAGATGAGGAGAAAGTCATTCGTTTTACACCTCAAGAATTAGGGAGACATGAATTTT 
CTTGTGGCATGAAGATGCAAAAGGGAAGCTATATAGTCGTTGAGAAGACTCGAAAATCTC 
TATCTCTCCTGCAAACGTTTTTGGATTACTAGTATCTTTACTGTGCCTCTTGTGATTCTC 
ATGATTGGGATGTTGGCAGGTAGCATTAGTCATCAAGTCATGCATTGGGGAACCTTTTTA 
GCAACAACGCCTATTATGTTAGTTGCGGGTAAGCCATATATCCAGAGTGCTTGGGCCAGT 
TTTAAAAAGCACAATGCCAACATGGATACCTTGGTTGCGCTGGGAACTCTAGTGGCTTAT 
TTCTATAGCCTAGTTGCTCTCTTTGCTGGTCTCCCTGTTTACTTCGAAAGTGCTGGATTT 
ATCCTCTTTTTCGTTCTTTTGGGAGCAGTTTTTGAGGAAAAAATGAGGAAAAATACGTCC 
CAAGCTGTGGAGAAATTACTGGACTTGCAAGCTAAAACCGCAGAAGTCTTGAGTGATGAT 
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AGTTATGTCCAAGTTCCTTTGGAACAAGTCAAGGTACGCGACCTTGATTCCAGTGCQXCC 
CGGTGAAAAGATTGCTGTTGATGGTGTCGTAGTAGAAGGTGTCTCTAGTATTGACGAATC 
CATGGTGACAGGTGAGAGTCTGCCTGTGGACAAGACAGTTGGAGATACTGTCATTGGCTC 
AACCATCAATCATAGTGGAACGCTTGTCTTTAGAGCAGAAAAAGTTGGCTCAGAGACTGT 
TTTGGCTCAGATTGTAGATTTTGTGAAGAAAGCTCAGACAAGTCGTGCGCCGATTCAGGA 
CTTGACGGATAAGATTTCAGGGATTTTTGTCCCAGTAGTTGTCATTTTAGGAATCATGAC 
CTTTTGGGTTTGGTTCGTCTTGCTCAGGGATAGTGTGGTCGTGCTTGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 883 1326 F 148 aa 

> 3864938-2 ORF translation from 883-1326, direction F 
VPLVILMIGMLAGSISHQVMHWGTFLATTPIMLVAGKPYIQSAWASFKKHNANMDTLVAL 
GTL.VAYFYSLVALFAGLPVYFESAGFILFFVLLGAVFEEKMRKNTSQAVEKLLDLQAKTA 
EVLSDDSYVQVPLEQVKVRDLDSSASR* 

Description : 
ATC S_S YNP7 

Assembly ID: 3864956 
Assembly Length: 12 52bp 

> 3864956 Strep Assembly -- Assembly id#3864956 

AC AAG AAC AAT TGGAAC AGGT AC AGG C TG T T AAAAAAT C GAT T AAC AC AGC T AG TG AAG A 
AGTGAAAAACCAAGTCTTGCTACCCATGGCTGATCACTTAGTGGCTGCTACTGAGGAAAT 
TTTAGCGGCTAATGCCCTCGATATGGCAGCGGCTAAGGGGAAAATCTCAGATGTGATGTT 
GGATCGTCTTTATTTGGATGCAGATCGTATAGAAGCGATGGCAAGAGGAATTCGTGAAGT 
GGTTGCCTTACCAGATCCAATCGGTGAAGTTTTAGAAACAAGTCAGCTTGAAAATGGTTT 
GGTTATCACAAAAAAACGTGTAGCTATGGGGGTCATCGGTATTATCTATGAAAGCCGTCC 
AAATGTGACGTCTGATGCGGCTGCTTTGACTCTTAAGAGTGGAAATGCGGTTGTTCTTCG 
TAGTGGTAAGGATGCCTATCAAACAACCCATGCCATTGTCACAGCCTTGAAGAAGGGCTT 
GGAGACGACTACTATTCATCCAAATGTGATTCAACTGGTGGAGGATACTAGCCGTGAAAG 
TAGTTATGCTATGATGAAGGCCAAGGGCTATCTAGACCTTCTCATTCCTCGTGGAGGAGC 
TGGCTTGATTAATGCAGTAGTTGAGAATGCCATTGTGCCTGTTATCGAGACAGGAACTGG 
GATTGTCCATGTTTATGTCGATAAGGACGCAGATGACGACAAGGCACTGTCTATCATCAA 
CAATGCCAAAACCAGTCGTCCTTCTGTCTGCAATGCCATGGAGGTTCTGCTGGTTCATGA 
AGACAAGGCAGCAAGCTTCCTTCCTCGCTTGGAGCAAGTGCTGGTTGCAGATCGAAAAGA 
AGCTGGGTTGGAACCAATTCAATTCCGCCTAGATAGCAAAGCAAGCCAGTTTGTTTCAGG 
TCAAGCTGCTCAAGCACAAGACTTTGATACCGAGTTTTTAGACTATATTCTAGCTGTTAA 
GGTTGTGAGCAGTTTAGAAGAAGCGGTTGCGCATATTGAATCCACAGTACCCATCATTCG 
GATGCTATTGTGACGGAAAATGCTGAAGCTGCAGCATACTTTACAGATCAAGTGGACTCT 
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GCAGCGGTGTATGTTAATGCCTCAACTCGTTTCACAGATGGAGGACAATTTGGTCTTGHT 
TGTGAAATGGGGATTTCTACTCAGAAATTGCACGCGCGTGGTCCAATGGGCTTGAAAGAG 
TTGACCAGCTACAAGTATGTGGTTGCTGGTGATGGGCAGATAAGGGAGTAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 1030 1251 F 74 aa 

> 3864956-2 ORF translation from 1030-1251, direction F 
VTENAEAAAYFTDQVDSAAVYVNASTRFTDGGQFGLGCEMGISTQKLHARGPMGLKELTS 
YKY WAGDGQ I RE * 

Description : 

gamma-glutamyl phosphate reductase (proA) homolog - Haemophilus 
influenzae (str ain Rd KW2 0 ) 

Assembly ID: 3864958 
Assembly Length: 17 8 5bp 

> 3864958 Strep Assembly -- Assembly id#3864958 

C TGC C C T AGC AGG AAC GC AAGAAGG AAC TGGAG AAT AGGC ATTTTC AAAATT ATAAC C T A 
CACTAGCCATCATATCTAATGTTGGAGTGCTAACTAGCTTATCCTTACTATTCAAGGATA 
AGGCGTCTGCTCTCATTTGATCTACAACAATCAAAATAATATTTGGTTGTTTTGTCTGAA 
CCATAAAATCTCCTTTCTAATATGGCAAAAGAGGCACAAGAAGATATCTACCTTTACTGC 
AC C C C T T T C TAT AT C AAT C T C T C T AT AT AAAG C AAT AAC AT T C T TG T T ATG T T T TAT AG A 
ACAATGGACTAAAATATGACTAAATCGATTAGGAAATTCAAATCATTTTCTAGTACTGTT 
TTAGTAAGTTACAGTGTACTATTCCAACTTCAATAAATTATAAACCTTTGTCTAATAACA 
ATTTTAGTGGAGATAAGAAATCCTACACCTAACTCATCTTACACGTAATCTATTTCTATT 
TTATCACAAAAAACGCAAGTAAGACCATTAACTCAATTCAGTTTTATCTGCCATTTTCAC 
AAATGGGAAATAAGTCAAGACACTAATAATCAAACAAACAACTGATAAGATGATGGCACG 
CCAATCAAATGCTGTAGAGAAGAAACCATATAAAATTGGAGGCATTACCCAAGTAACATT 
TTGTGTAACAGGTGAAACAAGACCCCAGCTTGTTGCCCAGTAAGCTACCGTTGCCATGAA 
AACCGGGCTAAGTACAAATGGTATAAATAGCAAAGGATTCAAGACAACTGGTAAACCATA 
ATTCGATACCGGCTCACCAATATTAAACAGAACTGGTGCTAGACCAAGTTTAGCAACTTT 
TC G AT AATG AC TGTTTCTT G AAAAAAT T AAAAT AG C AAG T AC T AAT C C T AAT C C T C C AAA 
CCAGACAAACGCCCCAAAAGACCCACTTGTCCATATATAAGGAATCGGTTCACCTTTTTG 
GAAAGCATCCAGATTCGCTAACATAGCAACTCCAAATAGCCCTTCCATGATGGGAGCCAA 
TACATTTCCTCCATGGAGACCAAAAAACCAGAATAACTTATTCAAAAAGATCATCAGAAT 
AACTGCAAAGAAACTTTGAGACAAACCTAGTAATGGCGTTTGTAACACCTTGTAAACCCA 
ATCAATCAATAAGTCATTGCTAAGTAAATGGAAAACATAAGTCAAGATGGCTACTATATA 
CATCGCCATAAATCCTGGAATGATAGAAGTGAACGGCTTAGCAATCGCAGGGGGAACTGA 
ATCTGGTAACTTGATTACCCAGTTCTTTTTCATTACTTTACAGAAAATAATAGAGGCTAA 
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AAATCCAATCATCATGGCTGTAAAGTAGCCTCTGGCATTAATATGGTTTCCTGGAATCI^C 
ATTCCCAATAGTTACCATCAGATTTTTACCATCAAATGCTAGATTATCAATTCCATGTTA 
AGATTTGATCTAATTTCACATCTCCTACATTTGCCAAAGGGAAACTCTTTGTAACTGTAC 
TTCCAATCGAAATGACAAACGAAGCAAGTGATACCAAACCAGCAGAAACTGTATCAACCT 
TGTAAATCTTAGCGATATTCACTCCCAAGCAATAGATGAACAACAAGGAAACAATTGGTA 
TACTTCCCTTGAATACCAAATTATTGATGTCAACAAGCCACTGAAAGGTTTTCGTAATAC 
TTCCTAGGTGAAATTGTTGTGGTAAATCCACTAGAAAAGCATTTAATAACAAAGCAATGG 
AACCTGTCATAATAACAGGCATAGTCCCCACAAATGAATCACGTT 

ORF Predictions : 

ORF # Start End Direction Length 



1 1427 1711 R 95 aa 

> 3864958-2 ORF translation from 1427-1711, direction R 
VDLPQQFHLGS ITKTFQWLVDINNLVFKGS I PI VSLLF I YCLGVNI AKI YKVDTVSAGLV 
SLASFVISIGSTVTKSFPLANVGDVKLDQILTWN* 

Description : 
unknown 

Assembly ID: 3865022 
Assembly Length: 13 8 6bp 

> 3865022 Strep Assembly -- Assembly id#3865022 

ATCGAATTTCATTTCTATTTCCTATTCCATTTTTATTCAAAAAATCAAAAAGCAAACTAG 
AAAGCTGGTCGCTGGTGGTTCAAAACACTGTTTTGAGATTGTCAATAGAACTGACAAACC 
CTGTAATATACCTGCATATATACATACGACAAGGCGATACTACCCTAGTTTGAAGAGATT 
TT C G AAG AG TATTC AT TTTTGTCTTT T AC TT AT TAT AC C AT AT TC AC AT AAAAAAAC G AA 
CATTCTTATCCTAAAAAATGCTCATTTTTCTTAAATTATCAATCTAAATCTGGTTTATAG 
AAGGAACGATTATCCATAGCGAAGATTTTATTGGTCATCTCTCCTTTATCCACCAAAGCC 
AGAGCTGTTGACATCATCATCATGCTTGCATCCAGATTGTCAATCATATGGATAATCTCT 
GCCTCCATAATACGTGGACGGACTGGAATTTCCATATTCAAGCAAGCCGTGGTGGACTTG 
AGGATGACATGACGAAGCAAAACGACTTCTTCCTTGGTATCATCGATGCCGAGTTCCATA 
ACTGTCTTGGTAATTTCGCTATCAATGAGAGCGATATGTCCAAGAAGATTACCTCGCACT 
GTGTACTCTGTCTGGTCTGGCCCCGTCAACTCGATAACCTTAGCTAAGTCATGCAGCATA 
ATCCCCGCATAGAGCAGGCTCTTATTGAGCTGAGGATAAACTTCGCTAATAGCGTCTGCC 
AAACGTACCATGGTCGCCGTATGATAAGCCAACCCCGTTTCAAAGGCATGGTGGTTGGTC 
TTGGCGGCTGGATAGGAGTAGAATTCCTTATCATACTTGGTGTAGAGATTTCGGACAATC 
CGTTGCCAGACAGGATTTTCAATTTTGAAAATCATTTGCGACATGTAGTCACGAATTTCC 
TTGACATCAACTGGTGACTTGACCTTGAAATCAGCTGGGTCATTGGGTTCACCAGCTTGA 
GGCAGGCGGAGAGTAATTTGATTGACTTGAGGGGTATTGTTATAAACTTCTCGGCGTCCT 
TTCATGTGGACAACCTTACCTGCGGTAAAGGCCTCAATGTTATGAGGTTGGGCATCCCAG 
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AGCTTCCCATCAATCTCGCCACTATCATCTTGGAAGGTAAAGGCTAGGTAGTTTTTCCCA 
GCTCGAGTTTGCCTCAGGTCAGCTGATTTGATTAGGTAAAAGCCTTCAAATAACTCATCT 
TTTTTCATGTGACTAATCTTCATATTCTTCCTCATTTTCTTGAAAATGGAGTAGATCAAG 
CGCAGGCTCACCTTCTGACAACTCAATGTGACGGAGCGTCCGCTCGATAGCTATGGTACG 
ACGGTTTAATAATTCGATCAATATTGCCAGAGGCATGTTGGAGATGTTTTTGTGCCTTGA 
CCAGAA 

ORF Predictions : 

ORF # Start End Direction Length 



1 279 1271 R 331 aa 

> 3865022-1 ORF translation from 279-1271, direction R 
VSLRLIYSIFKKMRKNMKISHMKKDELFEGFYLIKSADLRQTRAGKNYLAFTFQDDSGEI 
DGKLWDAQPHNIEAFTAGKVVHMKGRREVYISnNrTPQVNQITLRLPQAGEPNDPADFKVKSP 
VDVKEIRDYMSQMIFKIENPVWQRIVRNLYTKYDKEFYSYPAAKTNHHAFETGLAYHTAT 
MVRLADAISEVYPQLNKSLLYAGIMLHDLAKVIELTGPDQTEYTVRGNLLGHIALIDSEI 
TKTVMELGIDDTKEEWLLRHVILKSTTACLm^ 

S T AL AL VDKGEMTNK I FAMDNR SFYKPDLD* 

Description : 

gi 1 710422 (U21636) cmp-binding-f actor 1 [Staphylococcus aureus] 

Assembly ID: 3865036 
Assembly Length: 1167bp 

> 3865036 Strep Assembly -- Assembly id#3865036 

CTCAGATTACAGAGGACAATCAACTGGTTCATTTTCGTTTCCAGTTTCAAAAAGGCTTAG 
AAAGGGAGTTCATCTATCGTGTGGAAAAAGAAAAAAGTTAAGGCAGGTGTTCTCCTCTAC 
GCAGTCACCATAGCAGCCATCTTTAGTCTTTTGTTGCAATTTTATTTGAACCGACAAGTC 
GCCCACTATCAAGACTATGCTTTGAATAAAGAAAAATTGGTTGCTTTTGCTATGGCTAAA 
CGAACCAAAGATAAGGTTGAGCAAGAAAGTGGGGAACAGGTTTTTAATCTAGGTCAGGTA 
AGCTATCAAAACAAGAAAACTGGCTTAGTGACGAGGGTTCGTACGGATAAGAGCCAATAT 
GAGTTTCTGTTTCCTTCAGTCAAAATCAAAGAAGAGAAAAGAGATAAAAAGGAAGAGGTA 
GCGACCGATTCAAGCGAAAAAGTGGAGAAGAAAAAATCAGAAGAGAAGCCTGAAAAGAAA 
GAGAATTCCTAGTCAATTCAACTATAATGCGTTGAATCCAGAATAGTCCACTGTAGTTTC 
TAGAAAATTGCTGGAAATGGATGTTAAGCTCCAATTCATTTGTTTATATCTTATTTCAGT 
CCACTATACTTTGTGCTAAATTAAAGATATGAAACATGATTTTAACCACAAAGCAGAAAC 
TTTCGATTTCCCTAAAAATATCTTCCTCGCAAACTTGGTATGTCAAGCAGCCGAGAAACA 
GATTGATCTTCTATCAGACAAAGAAATTTTAGATTTCGGTGGTGGCACGGGTCTATTAGC 
CTTGCCCCTAACCCCTAGCCAAGCAGGCTAAGTCAGTCACTCTTGTAGACATTTCTGAGA 
AAATGTTGGAGCAAGCTCGTTTGAAAGTGGAGCAGCAAGCAATCAAGAATATCCAGTTTT 
TGGAGCAAGATTTACCGAAAAATCCCTTGGAGAAAGAGTTTGATTGCCTTGCTGTTAGTC 
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GGGTTCTTCATCATATGCCTGATTTGGATGCGGCTCTCTCACTGTTTCATCAACATTIDA 
AGGAAGATGGGAAACTCATCATTGCTGATTTTACCAAGACAGAAGCTAATCATCATGGAT 
TTGATTTAGCTGAACTGGAAAACAAGCTAATTGAGCATGGGTTTTTCATCTGTGCATAGT 
CAGATNCTCTATAGCGCTGAAGANCTG 

ORF Predictions: 

ORF # Start End Direction Length 



1 79 492 F 138 aa 

> 3865036-1 ORF translation from 79-492, direction F 
VWKKKKVKAGVLLYAVTIAAIFSLLLQFYLNRQVAHYQDYALNKEKLVAFAMAKRTKDKV 
EQESGEQVFNLGQVSYQNKKTGLVTRVRTDKSQYEFLFPSVKIKEEKRDKKEEVATDSSE 
KVEKKKSEEKPEKKENS * 

Description : 
unknown 

Assembly ID: 3865054 
Assembly Length: 916bp 

> 3865054 Strep Assembly -- Assembly id#3865054 

TCTCCCAACATATAATTTCCGTTTTCCAATCCCCCAGCTGTCATACAGTCTGTGATAAGA 

GCGATGTTTTCTGTTCCTTTTTGTTTGATAAGAATTTCGCAAGCCTTTGGATCTACGTGG 

TGACCATCACAGATCAACTCTGCATAGGTATGTGGCAATTGGTACATGGCTCCAACCATA 

CCCAATTCACGGTGAGTCAACCCACGCATTCCATTGTAGGCATGCACCCAAACACTCGCT 

CCAGCATCGACTGCTTTTTTGGCTTCATCAAAAGTCGCGTTTGAATGTCCAAGAGCAACC 

GTCACACCTTCGCCCGTAACTGTACGAACAAAGTCTTCCACCCCATCACGTTCTGGTGCA 

ATCGAATTTTATTAAGCAAGCCATTTGCCGCTTTTTGCCAAGAATGAAACTCCTCAACAC 

CCGGGTCTCTCATATAAGTTGGATTTTGTGCCCCCTTAAAAGTTTCTGTGAAATATGGAC 

CTTCATAATAAATCCCACGAATCTTAGCACCTGTTGCTTCTTTATAATGGTTTCCAAGAT 

TTTCAGTGACTGCAAGCAATTGCTCATAAGTGGCTGTTAAAGTTGTGGGTAAGAAACTGG 

TAACACCGGTACTAAGAAGTCCTTCACTCATAGTATGCAATGTACCTTCAATGTTGTTGT 

CCATCACATCTACACCTGCATATCCATGAATATGAGTATCCACAAGACCTGGGGCAATGC 

TATAACCTGTATAGTCAATCACCTCAGCCCCTTCAGGAATCTGCTCTACATGTTTCCCAA 

ACTTGCCGTCCACAAGTTCCAAGTAACCACCTCGACAAATCCGTGTGGGTAGAAAAACTG 

ATCCGCTTTAATATAGTTAGGCATAATGTTAACCTCCTTAAAAGATTGATTCTACAATTT 

ATTATGTCAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 302 793 R 164 aa 
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> 3865054-1 ORF translation from 302-793, direction R _ 
VDGKFGKHVEQIPEGAEVIDYTGYSIAPGLVDTHIHGYAGVDVMDNNIEGTLHTMSEGLL 
STGVT S F L PTTLTAT YEQLL AVTENLGNH YKE ATG AK I RG I Y YEG P YF TETFKGAQNPTY 
MRDPGVEEFHSWQKAANGLLNKIRLHQNVMGWKTLFVQLRAKV* 

Description : 

N-acetylglucosamine-6-phosphate deacetylase (nagA) homolog - 
Haemophilus influe nzae (strain Rd KW2 0) 

Assembly ID: 3865102 
Assembly Length: 7 8 6bp 

> 3865102 Strep Assembly Assembly id#3865102 

CTGGATTAAAACGAGGCAGTTTCAGACTAATATCCAAGTCGTAAGAAATGCCTGAAATAA 
GCTTTTCTAAATTGTCCAAAGCTTGCGGGAAAACGCTCTTGGAATAGTTTCTCTAAAGAA 
CTTGCTGATATAAAGACATCTTGTCTCGAACGCAAGGGAACTTCTCTGAGCGGTAGATTT 
TCTTTAATCGCTGTTAAAACTTGAAGAACTTCTCTATCCCTGCTTTCAAAAGCGTTGACC 
CGATAAAGAGGTAAGATAGGATGATGAAATTCGCTTGCTAGTGTTTCTGGATAAACCCCT 
ATATAGTAATCACAGCCTAGTTCTAACGACTCAACTCTATCAAAATAAGGCACAATGACC 
GCGATATCCTCCAGGTACTGGGACAGGACTGACCAAGTTTTCTCCCCCTGCATCTTGGCT 
GTCGAAAGCTTCATCAACTGCTGATAGCCCACACTAGATAGAGCTAAAAAGCGCAAATTC 
ACTTCCTGATCATCTACAAACACTGTCATTTCAAGCCCTAGCAAAGGATGAATGCCGTAT 
TTTTTTGTAATCTCTAGAAAGTCGAAAGCGCCATAAAGATTGTCAATATCCATCATAGCC 
AAATGAGTGTAGCCGTATTCTTTAGCTGCTCTCACATACTTTTCGATCGAAATGACGCTT 
TCCATAAAACTATAGACTGTTTTTGTATCTAGTTGTGCGATCAATTTACACTTCTCCTCT 
ATCCTTCTCACTATATTATACCATTTTCACCTATAAATGGCTTCTCTTGAGAAAAATTTC 
GATCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 27 731 R 235 aa 

> 3865102-1 ORF translation from 27-231, direction R 

VRRI EEKCKL I AQLDTKTVYSFME S VI S I EKYVRAAKEYGYTHLAMMDIDNL YGAFDFLE 

ITKKYGIHPLLGLEMTVFVDDQEVNLRFLALSSVGYQQLMKLSTAKMQGEKTWSVLSQYL 

EDIAVIVPYFDRVESLELGCDYYIGVTPETLASEFHHPILPLYRVNAFESRDREVLQVLT 

AIKENLPLREVPLRSRQDVFISASSLEKLFQERFPASFGQFRKAYFRHFLRLGY* 

Description : 
unknown 

Assembly ID: 3865156 
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Assembly Length: 1213bp 

> 3865156 Strep Assembly -- Assembly id#3865156 

CACTTTCAGCTTCTTCTCTTTTTGAACGG.TTATAAACACGAATCAGATTCCCTATTTCTT 
GCGATTTATGTGATTCCTTATTTTCCAATCTAAAGTATAGTGAAATGAAATAAAACATGC 
GCAAATCGATTAAGGAATTTAATCTAATTTCTAACAATGTCTTAGAAATCAAAGTGTACT 
ATTTTAACTTCAATGCACTAAACATCTAATACTCAATAAAAATCAAAGAGCAAACTAGGA 
AACTAGCCGCAGGTGGCTCAAAACACTGTTTTGAGGTTGTAGATGAAACTGACGAAGTCA 
GTAACCATACATACGGCAAGGCGACGCTGACGTGGTTTGAAGAGATTTTCGAAGAGTAGC 
AAAATGGAAAAAGGAGTGAGTGAAGCACATCGCCTCCCCACTCCTTTTTCTGTTTTTAGG 
CTGTTTTTTCAACCTTCAAGATTTTTACATCATAGCTACCAACAGGCGTTTCAATGGTTG 
CTGTATCACCTGTTTTCTTGCCAATCAAGGCCTGCCCAATTGGGCTTTCATTTGAAACCT 
TACCTGCAAAGGCATCCGCACCAGCTGAACCTACGATAATATAAACTTCTTCTTCGTCCT 
CACCAATTTCTTGGATGGTGACTGTTTTACCAATCGCTACTTCGTCCTGGGCAACTGCGT 
CGCTATTGACGATTTCAGCATAGCGGATTTTTGTTTCTAAGCTAGAGATTTGTCCTTCGA 
CAAAGGCTTGTTCATCCTTAGCTGCTTCGTACTCACTGTTTTCTGAAAGGTCACCGTATG 
AACGGGCAATCTTAATGCGTTCTACCACTTCTGGTCGACGAAACCAATTTCAATTCTTCT 
AATTCTTTTTCAAGTTTTTCCTTTTCCTCAAGGGTCATAGGATATGTTTTTTCTGCCATT 
TTTCTCAACTTTCTTCTGATAATATTTTCTAAAGAAAATTATGTGAAGTATCACATAATT 
TTAGTTTGTTTAGTTTAATTTGCTGTTGACATGTTCAGCGACATTGCGGTCGTGGTCTTC 
TTGATTGTTAGCATAGTAAACCTTGCCTTCTGTGACATCTGCTACAAAGTAAAAGTTATC 
GCTCTTAGTTTGATTGATGCTTGACTCAATCCGCATCCAAGACTTGGACTATCGACTGGA 
CCAGGCATGAGACCTACATTTTTATAAACATTATAAGGTGAATCAATGTTGGTATCAATC 
GCAACATCCTCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 416 808 R 131 aa 

> 3865156-1 ORF translation from 416-808, direction R 
WERIKIARSYGDLSENSEYEAAKDEQAFVEGQISSLETKIRYAEIVNSDAVAQDEVAIG 
KTVT I Q E I GE DEE E VY 1 1 VG S AG ADAF AGKVSNE S P I GQ AL I GKKTGDT AT I ET P VG S YD 
VKILKVEKTA* 

Description : 

TRANSCRIPTION ELONGATION FACTOR GREA (TRANSCRIPT CLEAVAGE FACTOR 
GREA) . - ESCHE RICHIA COLI . 

Assembly ID: 3865160 
Assembly Length: 1173bp 

> 3865160 Strep Assembly -- Assembly id#3865160 
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TGCGGCTGAGTTGGGAATTCCTATCGTTAATAAGCGTGTATCGGTGACACCTATTTCTCT 
GATTGGGGCAGCGACAGATGCGACGGACTACTGGTTCTGGCAAAAGCGCTTGATAAGGCT 
GCGAAAGAGATTGGTGTGGACTTTATTGGTGGTCTTTCTGCCTTAGAACAAAAAGGTTAT 
CAAAAGGGAGATGAGATTCTCATCAATTCCATTCCTCGCGCTTTGACTGAGACGGATAAG 
GTCTGCTCGTCAGTCAATATCGGCTCAACCAAGTCTGGTATTAATATGACGGCTGTGGCA 
GATATGGGACGAATTTATCAAGGAAACGGCAAATCTTTCAGATATGGGAGCGGCCAAGTT 
GGTTGTATTCGCTAATGCTGTTGAGGACAATCCATTTATGGCGGGTGCCTTTCATGGTGT 
TGGGGAAGCAGATGTTATCATCAATGTCGGAGTTTCTGGTCCTGGTGTGGTGAAACGTGC 
TTTGGAAAAAGTTCGTGGACAGAGCTTTGATGTTAGTAACCCGAAAACCAGTTAAGAAAA 
CTGCCTTTTAAAATCACTCCGTATCCGGTCCAATTGGTTTGGTCAAATGCCCAGTGAGAG 
ACTGGGTGTGGAGTTTGGTATTGTGGACTTGAGTTTGGCACCAACCCCTGCGGTTGGAGA 
CTCTGTGGCACGTGTCCTTGAGGAAATGGGGCTAGAAACAGTTGGCACGCATGGAACGAC 
AGCTGCCTTGGCCCTCTTGAACGACCAAGTTAAAAAGGGTGGAGTGATGGCCTGTAACCA 
GGTCGGTGGTCTATCTGGTGCCTTTATCCCTGTTTCTGAGGATGAAGGAATGATTGCTGC 
AGTGCAAAATGGCTCTCTTAATTTAGAAAAACTAGAAGCTATGACGGCTATCTGTTCTTG 
TTGGATTGGATATGATTGCCATCCCAGAAGATACGCCTGCTGAAACTATTGCGGCTATGA 
TTGCGGATGAAGCAGCAATCGGTGTTATCAACATGAAAACAACAGCTGTTCGTATCATTC 
CCAAAGGAAGAGAAGGCGATATGATTGAGTTTGGTGGTCTATTAGGAACTGCACCCGTTA 
TGAAGGTTAATGGGGCTTCGTCTGTCGACTTCATCTCTCGCGGTGGACAAATCCCAGCAC 
C AAT T C AT AGT T T T AAAAATT AAG AAAAT AGG A 

ORF Predictions: 

ORF # Start End Direction Length 



1 136 375 F 80 aa 

> 3865160-1 ORF translation from 136-375, direction F 
VDFIGGLSALEQKGYQKGDEILINSIPRALTETDKVCSSVNIGSTKSGINMTAVADMGRI 
YQGNGKSFRYGSGQVGC I R * 

Description : 
unknown 

Assembly ID: 3865172 
Assembly Length: 12 09bp 

> 3865172 Strep Assembly -- Assembly id#3865172 

TCGGAATCTGAGCTAGTGTAGCTTCCTTAATCTTATCTGATAAGATAGCTGTCATATCAG 
ACTCAATCATTTCCTGGAGCAATCAACATTGACTCGTATATTCCGACTAGCGACCTCGCG 
TGCCACAGACTTGGTAAAGCCAATCAAGCCAGCCTTAGAAGCAGCATAGTTAGCTTGACC 
AATATTCCCCATCAAACCAACAACACTAGACATATTAATGATAGCACCTTCTCTGGCTTT 
CATCATCGGTTTCAAGACTGATTGTGTCATATTAAAGGCACCAGTCAGATTGACCTTGAG 
CACTTTTTCAAAATCTGCTTCTGTCATCTTGAGCATAAGAGTATCTTGGGTAATCCCTGC 
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ATTGTTGACCAAAACATCTACTGAACCCAGTTCTGCAATAGCTTGATCAATCATACGCJT 
AGCGTCTGCAAAATCTGATACATCTCCTGAAATGGGAACCACCTTGATACCATAGTTTGA 
AAACTCAGCGAGCAATTCTTCTGAGATTGCCCCACGACTGTTTAAGACAATGTTGGCTCC 
TGCTTGAGCAAACTTGTGGGCGATGGCAAGACCAATTCCACGACTCGAACCTGTAATAAA 
GATATTTTTATGTTCTAGTTTCATTTTTTTCCTTTCAAAACTTCTACTTATTTTAGTCTA 
TTTTTCTAAAAGTGCTACTAAACTCGCTTGATCTTCCACATGAGCTAAGTGAGCAGTTTG 
ATCAATTTTTTTAACAAAACCTGACAAGACTTTCCCCGGTCCAATCTCGAATAAAGTTGC 
TTATGCCTGCTTCTTGCATGACCCCAATACTTTCATAGAAACGAACGGGTTCCTTGACCT 
GACGCGTCAAGAGCTGAGCAATGTCCTCTTTTTGCATCACAGCAGCTTCTGTATTGCCGA 
CTAGGGGACAAGTAAAATCTGAAAAACTTACCTGAGCTAGAGTTTCAGCTAGTTTCTGGC 
TAGCAGGCTCAAGGAGAGCGGTGTGAAAGGGACCTGACACCTTAAGAGGAATCAAGCGTT 
TGGCACCTGCTTCTTGCAAAAGTTCAACCGCTCGATCAACTGCAACCACTTCTCCAGCAA 
TGACGATTTGTGCAGGTGTGTTATAGTTGGCTGGAGTAACCACTCCAAGTTCCAGAAGCT 
TTTTGACAGGCTTCTTCAATGACCTCTACTGGCGTATTGAGAACTGCTACCATCTTGCCA 
AGTTCAGCA 

ORF Predictions : 

ORF # Start End Direction Length 



1 731 1123 R 131 aa 

> 3865172-2 ORF translation from 731-1123, direction R 
WTPANYNTPAQIVIAGEWAVDRAVELLQEAGAKRLIPLKVSGPFHTALLEPASQKLAE 
TLAQVSFSDFTCPLVGNTEAAVMQKEDIAQLLTRQVKEPVRFYESIGVMQEAGISNFIRD 
WTGESLVRFC* 

Description : 

malonyl coenzyme A-acyl carrier protein transacylase (fabD) 
homolog - Haemophil us influenzae (strain Rd KW2 0) 

Assembly ID: 3865228 
Assembly Length: 813bp 

> 3865228 Strep Assembly Assembly id#3865228 

ATGACACGTCTGTTCTCTCAAGCAGAAATGGCAGAGTAACAAGCTCGATATTGAGGTAGC 
CGATAAAGAATTGGCTGAATTTGAAGCTCAGATTAAACAGGAAGTGGAAGCTCCAACTTG 
TAGTGAGTCCTCAGGTTGAAGAAGAGCCTCAGCTCATCCAGTTGGCCCAATGTATGAAGA 
ACCAGAAGTAAATCCAGTGCATCCGACAGGTCCAACACCAGCTACAGAAACTGTTGATTC 
AATACCGGGATTTGAAGCACCGCAAGAATCTGTTACAATTTTATAAGAAATATTCTGAGA 
ACAATATCTTATCCTTATATTTCCAGCGAGCAGGAAATGGTGTGAGTCCTGCATTCCCTA 
TCGATAAGATTATCCTCTCAAACTATCAAGTCTGAATCTAGTAAGATTTGACGTTCCCCA 
CGTTACGGGATAAGAGAGAGAAAGACTAAATCTTTTTCCGAATAAAGGTGGTACCACGAT 
TTTCGTCCTTTTTGGAAGTCGTGGTTTTTAATTTGTTATTATTTATAAAGGAGATACCAT 

112 



WO 98/19689 



PCT7US97/19226 



GAAACTCAAAGACACCCTTAATCTTGGGAAAACTGAATTCCCAATGCGTGCAGGCCTTCC 
TACCAAAGAGCCAGTTTGGCAAAAGGAATGGGAAGATGCAAAACTTTATCAACGTCGTCA 
AGAATTGAACCAAGGAAAACCTCATTTCACCTTGCATGATGGCCCTCCATACGCTAACGG 
AAATATCCACGTTGGACATGCTATGAACAAGATTTCAAAAGATATCATTGTTCGTTCTAA 
GTCTATGTCAGGATTTTACGCGCCATTTATTCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 197 286 F 30 aa 

> 3865228-1 ORF translation from 197-286, direction F 
VHPTGPTPATETVDSIPGFEAPQESVTIL* 

Description : 
unknown 

Assembly ID: 3865230 
Assembly Length: 953bp 

> 3865230 Strep Assembly -- Assembly id#3865230 

ATCGAATTATTTTGAAACAAGGTGGATCAGCTATTTTGGCCTTGATTAGTATTTTACTCT 
TTAAATACACTTGAAGGTCGATTCTAATCTCGCTAATCCTTTTTAATCCAGAATAAGGGA 
AATATGTTATACTTGTTTTTAAGAAAAAAGTTTCATTGAATTGGTTTTGAGGAGTTAGAA 
ATGAAAGTATTAGTGACAGGTTTTGAGCCCTTTTGAGGCCATTAAAGGTTTACCAGCTGA 
AATCCATGGTGCTGAGGTCCGTTGGCTAGAGGTGCCGACAGTTTTTCACAAATCTGCTCA 
AGTATTGGAAGAAGAGATGAATCGTTATCAACCTGACTTTGTCCTTTGTATTGGGCAAGC 
TGGTGGAAGAACTAGTTTGACACCTGAACGAGTGGCCATTAATCAAGACGATGCACGTAC 
TTCTGATAACGAAGATAATCAACCGATTGACCGTCCCATTCGCCCAGATGGTGCTTCGGC 
CTACTTTAGTAGTTTGCCGATTAAAGCGATGGTTCAAGCTATAAAAAAGAAGGATTACCG 
GCCTCTGTTTCCAATACGGCAGGGACTTTTGTCTGCAGCCATTTGATGTATCAGGCTCTC 
TATTTGGTAGAAAAGAAATTCCCATATGTTAAGGCAGGTTTTATGCATATTCCTTATATG 
ATGGAACAGGTGGTGAACAGACCGACTACTCCAACTATGAGTTTAGTGGATATTCGGCGA 
GGGATAGAAGCAGCAATCGGCGCTATGATAGAACATGGAGATCAGGAACTCAAGTTGGTA 
GGCGGAGAAATTCATTGATAGAAAAAAGCTTGAGGGGAAAACCTTCAAGCTTTTGGACGT 
TTTCGAGCCAATACTGCTCGGTAAAACATAATTTTAGTGCATTGGATATAAGGTAGGAGT 
GAAAAAGTAGCAATGCCAAAGGTAATCCAATTGAGGAAGTACCAAGGAAGAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 272 586 F 105 aa 
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> 3865230-1 ORF translation from 272-586, direction F _ 
VPTVFHKSAQVLEEEMNRYQPDFVLCIGQAGGRTSLTPERVAINQDDARTSDNEDNQPID 
RPIRPDGASAYFSSLPIKAMVQAIKKKDYRPLFPIRQGLLSAAI* 

Description : 

PYRROL I DONE -CARBOXYL ATE PEPTIDASE (EC 3.4.19.3) ( 5 -OXOPROLYL- 
PEPTIDASE) . - STR EPTOCOCCUS PYOGENES. 

Assembly ID: 3865378 
Assembly Length: 10 6 0bp 

> 3865378 Strep Assembly Assembly id#3865378 

C T AC TTG AAAC AG AAC TG AAATT AT AC C C AC T AC C TC C C TG ATT ATC TTC AATGC TT AC G 
TCTAAATAAACTTCCCCACTATTATTTAGCTTAGCAACAACTGTTATAGTAAAATAACAT 
AAAATTCACATAAATAGATTAGGGAAATCAAAGCAACTTCTAGGAATGTTTTAGCAGTCA 
CAGTGTACTTTCCCAGCATCAAGCCACTATAACTCTGCACATAAAAATGGAGAAGATGGC 
CATCCTCTTCTCCAAATATTAACTTCTTTACAAACCAACTATAGTTGACAAAGAACCTAA 
AATCAATTGATAACACGAGGTCAGGTCGGTCAACTCTTTCAACTGAAGCCCTGTCAACTC 
TTCCCATTTATCAATCTTGTATTGGAGAGAATTGCGGTGCAGATAGAGTTGCTGGGCTGT 
TTAAGTGAGAACAGCACTATTTTCCCAAAGAGAGAGAATGATTTCCTGAATCTGATCTTG 
ATCCAAAATCATCTGGTGTAGACATTCCTTGATTGGCTTCAAGTCCACGAGTCTTTCTCC 
CAGACTCCAAAGATAGAGCTGAGAAAAAGTATGAACACCTTGGTGACCCTGACGCCACCA 
TGTCTTGAACAAATCCCGCTCAGCTTTGATTAAGTCTGATAGGGCTTGATGTCCCGTCTG 
AGACCAAACCTGACCCAACATGATAGAAAGACGAAGTCCAAAGTCATACTCAACCGCTTC 
AATCGTATCACTTAAAATATCTCTTACAGAAGTGTATTTGTCTTGTTGAAGCACGAAAAC 
ATAATCCTGAGATCCGACCTGTAGCACTGTCTGACAATTCGGAAAAAGAGTCCGCATCAT 
ATCTAGCCAAGAAGCCAGATTTTCCTGCTGAAAATAAGAAAGATGGCAATAAACCAACTG 
AATCTTTTTAAAAACTTGCGGTGCCTGTCCCTTGCCTTCAACCAGATAGGAATACCAAGG 
GTTTAGCGAACGAACCTGCTCCTGCTGGGTCAAAAGGGCAACCAACTGCTTTTCACGCTC 
GCTGAGCCCAGCTTCCTCCAGCAAAATCCACTGCTGAGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 421 807 R 129 aa 

> 3865378-1 ORF translation from 421-807, direction R 
VLQVGS QDYVFVLQQDKYTS VRD I L SDT I EAVEYDFGLRL S IMLGQVWSQTGHQ AL SDL I 
KAERDLFKTWWRQGHQGVHTFSQLYLWSLGERLVDLKPIKECLHQMILDQDQIQEIILSL 
WENSAVLT* 

Description : 
unknown 
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Assembly ID: 3865470 
Assembly Length: 895bp 

> 3865470 Strep Assembly -- Assembly id#3865470 

ATTTTAGACTTTGATGACAATCCTCAGGCGGTTATCATGCCCAATCACGAGGGGCTGGAA 
TTGCAGTTGCCAAAGAAGTGTGTTTATGCATTTTTAGGTGAGGAGATCTGACCGCTATGC 
AAGGGAAGTAGGGGCGGATTGTGTCGGCGAATTCGTTTCTGCTACCAAGACCTATCCAGT 
CTCTTTCATCAACTACAAGGGTGAGGAGGTCTGTCTGGATCAGGCTCCTGCTGGCTCCGC 
TCCAGCAGCCCAGTTTATGGATGGGTTGATTGGCTATGGTGTGGAGCAGCTTATCTCTAC 
TGGGACCTGTGGTGTCCTAGCTGATATAGAGGAAAATGCCTTTCTAGTCCCTGTTCGCGC 
TTTGCGAGATGAGGGAGCCAGTTACCACTATGTGGCACCTTGTCGTTATATGGAAATGCA 
GCCAGAGGCTATTGCTGCTATTGAGGAAGTTTTGG/lAGACAGAGGGATTCCTTATGAAGA 
AGTCATGACCTGGACGACAGACGGTTTTTACCGAGAAACGGCTGAAAAGGTGGCTTATCG 
TAAGGAAGAAGGCTGTGCTGTTGTGGAGATGGAGTGTTCTGCTCTTGCGGCAGTAGCTCA 
ATTGCGTGGGGTTCTCTGGGGTGAATTGTTGTTCACAGCAAATTCTCTAGCGGACTTGGA 
CCAGTACAACAGTCGTGACTGGGGCTCGGAACCTTTTAATAAGGCGCTAAAACTGAGTTT 
AGCAAGTGTCCACCACCTTTAGTTGTACTGGCAAAGGATTTGTTTTATCATAAAATGTCT 
AGCTCATACTTTTCAAAAATATGTTTAAACGAAGTCACCTTCCTCTTGTCCTAAGCATGT 
TTGAAGTTGGGAAAAATCTTTAAAATCAGAAAAACGTATCATATCAGGTTGATGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 98 742 F 215 aa 

> 3865470-1 ORF translation from 98-742, direction F 
VRRSDRYAREVGADCVGEFVSATKTYPVSFINYKGEEVCLDQAPAGSAPAAQFMDGLIGY 
GVEQLISTGTCGVLADIEENAFLVPVRALRDEGASYHYVAPCRYMEMQPEAIAAIEEVLE 
DRG I PYEEVMTWTTDGFYRETAEKVAYRKEEGC AVVEMEC S ALAAVAQLRGVLWGELLFT 
ANSLADLDQYNSRDWGSEPFNKALKLSLASVHHL* 

Description : 
unknown 

Assembly ID: 3865632 
Assembly Length: 64 5bp 

> 3865632 Strep Assembly -- Assembly id#3865632 

AGGGCTGTCAAGCTTGGTTAGAACGTTTAGAAAAGGAGAGTTAAGGTGGAAAATCTTACG 
AATTTTTACGAAAAGTATCGTGTCTATCTGACTCGTCCACGTTTAGAGCTTTTGGCAGTA 
GTTACCATTGTTTTANGNGCTGTACTCGTCTTTTTTCTAAATATTCCAGGAAAAGGTGTC 
TTAAAACTCGATAATGGAACGATTGTTTATGATGGCAGTCTTGTCCGTGGTAAAATGAAT 

115 



WO 98/19689 



PCT/US97/19226 



GGCCAAGGTACCATTACCTTCCAAAATGGAGACCAATATACAGGTGGCTTCAACAATGCA 
GCCTTCAACGGAAAAGGTACCTTTCAATCTAAAGAAGGCTGGACCTACGAAGGTGATTTT 
GTAAATGGTCAGGCTGAAGGAAAAGGGAAACTAACAACAGAACAAGAAGTCGTTTATGAA 
GGAACTTTTAAACAAGGCGTTTTTCAACAAAAATAAAGCCTCCTTATCAAAGGAGGTATT 
ATTAGAATTACAAGGTAAGCGTTTACCTGTAAATCCCTTTCTTTCCAAATCCCTCTTCCA 
AGCAAGTTTGTGAAATAAAAAATATTTGAAATAAATTTCACAAACTTCAAAGATAAAACC 
TGATAAGAAAAGAAAATGAGAAAAGTTTCGCAAGAGTTTAAAAAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 46 456 F 137 aa 

> 3865632-1 ORF translation from 46-456, direction F 
VENLTNFYEKYRVYLTRPRLELLAWTIVLXAVLVFFLNIPGKGVLKLDNGTIVYDGSLV 
RGKMNGQGTITFQNGDQYTGGFlSnsrGAFNGKGTFQSKEGWTYEGDFVNGQAEGKGKLTTEQ 
EWYEGTFKQGVFQQK * 

Description : 
unknown 

Assembly ID: 3865710 
Assembly Length: 572bp 

>. 3865710 Strep Assembly Assembly id#3865710 

GAGATCTGTCTTGACACCAAAAGTGTGGAGTACGCCAGCTAATTCAACGGCGATATAACC 
AGCGCCTAGAATCGCAATTGACTCTGGAAGTTCTTCCCAGGCAAATACATCATCAGAAGA 
GCCACCTAGCTCAGCACCAGGAATATTAGGAATACTTGGATGGGCACCTGTAGCAATCAC 
GATATGTCTAGCACGAATCAGTTCACCATTTACGCTTACAGTATGAGAATCTACAAATTC 
AGCATGACCTTCAATCAAGTCTACACCGTTGCGTTTAAAACTACCATCATAGAGAAGAAC 
GAGCGCGATCAATGTAGGCTTCACGATTGCGACGTAGGGTTGCAAAGTTAAAGTTAAGAT 
CAGTAGTCTCAAAGCCGTAGTCTCCTCCAAATTGATGGAAAGTCTCAGCGATTTGCGCCC 
CGCTACCACATGATTCTTTTAGGAACACAACCGACGTTGACACAGGTTCCACCTAATTTC 
TTTTCCTCAATAACGGCTGCTTTGGCTCCATGTTCCCAGCACGGTTCATGGTAGCGATCC 
TCCGCTACCTCCACGATAGCAATGATATCATA 

ORF Predictions: 

ORF # Start End Direction Length 



1 287 448 R 54 aa 

> 3865710-1 ORF translation from 287-448, direction R 
VFLKESCGSGAQIAETFHQFGGDYGFETTDLNFNFATLRRNREAYIDRARSSL* 
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Description : 

glutathione reductase (NADPH) (EC 1.6.4.2) - Streptococcus 
thermophi lus 

Provided in Table 2 is information on the direction of the ORF (forward or reverse) 
for each polynucleotide in Table 1 . Also listed for each ORF is its start and stop codon 
positions (refer to the columns containing nucleotide code labeled "Start" and "Stop"). The 
triplet codon sequence for each start and stop codon is also shown. These codons may be 
shown in the sense orientation or antisense orientation, such as GTG and CAC, 
respectively, for start codons. The "Length" column discloses the length of each 
polynucleotide assembly. The direction of translation on the polynucleotide depicted is 
denoted by and "Forward" for forward or and "Reverse" for reverse (or being on the 
opposite strand from the one depicted). As indicated above, the "Assembly ID" number is a 
unique identifier assigned to each ORF of Table 1 and allows a correlation between the data 
in Tables 1 and 2. 



TABLE 2 



Assembly 


Start 


Stop 


Start 


Stop 


Length 


Direction 


ID 














3049156 


~CAC 


TCA- 


236 


385 


50 


Reverse 


3049862 


GTG 


TGA 


383 


526 


48 


Forward 


3112810 


~CAC 


TTA~ 


601 


804 


68 


Reverse 


3112866 


~CAC 


TTA~ 


220 


513 


98 


Reverse 


3113664 


GTG 


TAA 


165 


392 


76 


Forward 
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Assembly 


Start 


Stop 


ID 






31 13716 


~CAC - 


TTA~ 


3174176 


GTG 


TAA 


3174186 


GTG 


TAG 


3174374 


GTG 


TGA 


3174972 


~CAC 


TTA~ 


3175138 


~CAC 


TCA~ 


3175860 


GTG 


TAA 


3175918 


GTG 


TGA 


3811220 


~CAC 


CTA~ 


381 1436 


~CAC 


TTA~ 


3811984 


GTG 


TGA 


3857228 


~CAC 


TCA~ 


3857842 


GTG 


TAA 


3857996 


GTG 


TAA 


3858236 


~CAC 


CTA~ 


3858264 


~CAC 


TCA~ 


3858610 


~CAC 


TTA~ 


3858716 


~CAC 


CTA~ 


3859124 


~CAC 


CTA~ 


3859244 


~CAC 


TTA~ 


3859250 


~CAC 


CTA~ 


3859588 


~CAC 


TTA~ 


3859774 


~CAC 


CTA~ 


3860140 


GTG 


TAA 


3860140 


GTG 


TAA 


3860206 


~CAC 


TTA~ 


3860270 


GTG 


TAG 


3860438 


GTG 


TAG 


3860438 


GTG 


TGA 


3860544 


GTG 


TAA 


3860558 


~CAC 


TTA~ 


3860568 


GTG 


TAA 


3860582 


GTG 


TGA 


3860724 


GTG 


TGA 



Start Stop Length Direction 



94 


291 


66 


Reverse 


139 


543 


135 


Forward 


83 


283 


67 


Forward 


154 


294 


47 


Forward 


169 


678 


170 


Reverse 


79 


945 


289 


Reverse 


51 


251 


67 


Forward 


212 


535 


108 


Forward 


316 


873 


186 


Reverse 


1164 


151 1 


116 


Reverse 


134 


454 


107 


Forward 


1141 


1356 


72 


Reverse 


45 


341 


99 


Forward 


58 


456 


133 


Forward 


1 


261 


87 


Reverse 


439 


1365 


309 


Reverse 


374 


949 


192 


Reverse 


238 


402 


55 


Reverse 


73 


453 


127 


Reverse 


310 


462 


51 


Reverse 


244 


402 


53 


Reverse 


102 


443 


114 


Reverse 


9 


131 


41 


Reverse 


302 


511 


70 


Forward 


605 


856 


84 


Forward 


898 


1056 


53 


Reverse 


346 


966 


207 


Forward 


1 


276 


92 


Forward 


460 


1128 


223 


Forward 


222 


689 


156 


Forward 


717 


1376 


220 


Reverse 


1040 


1291 


84 


Forward 


356 


1027 


224 


Forward 


139 


498 


120 


Forward 
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Assembly 


Start 


Stop 


ID 






3860724 


GTG 


TGA 


3860858 


GTG 


TAG 


3860890 


GTG 


TAG 


3860952 


~CAC 


TTA~ 


3860962 


~CAC 


TTA~ 


3861268 


~CAC 


TTA~ 


3861270 


~CAC 


TTA~ 


3861288 


~CAC 


CTA~ 


3861306 


GTG 


TAA 


3861306 


GTG 


TAA 


3861334 


GTG 


TAA 


3864148 


GTG 


TAG 


3864148 


GTG 


TAA 


3864148 


GTG 


TAA 


3864172 


GTG 


TAG 


3864180 


~CAC 


TTA~ 


3864184 


GTG 


TGA 


3864184 


GTG 


TAA 


3864194 


~CAC 


CTA~ 


3864338 


GTG 


TGA 


3864360 


GTG 


TAA 


3864388 


GTG 


TGA 


3864406 


~CAC 


TTA~ 


3864452 


~CAC 


TCA~ 


3864458 


GTG 


TAA 


3864458 


GTG 


TGA 


/■\ <~> y A A r-t A 

3864474 


~CAC 


CTA~ 


3864474 


~CAC 


TTA~ 


3864510 


~CAC 


TTA~ 


3864526 


~CAC 


TTA~ 


3864548 


GTG 


TGA 


^ o y A C A o 

3864548 


GTG 


TAA 


3864582 


~CAC 


TTA~ 


3864604 


~CAC 


CTA~ 


3864604 


~CAC 


CTA~ 


3864610 


GTG 


TAA 


3864716 


GTG 


TAA 


3864718 


GTG 


TGA 


3864802 


~CAC 


TTA~ 
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686 


1024 


1 13 


T— » 1 

. Forward 


610 


807 


66 


Forward 


397 


486 


30 


Forward 


449 


715 


89 


Reverse 


152 


646 


165 


Reverse 


457 


645 


63 


Reverse 


627 


824 


66 


Reverse 


357 


572 


72 


Reverse 


717 


1208 


164 


Forward 


1201 


1410 


70 


Forward 


76 


975 


300 


Forward 


212 


940 


243 


Forward 


1202 


1753 


184 


Forward 


2750 


3037 


96 


Forward 


31 1 


862 


184 


Forward 


930 


1616 


229 


Reverse 


197 


670 


158 


Forward 


612 


1304 


231 


Forward 


1084 


1380 


99 


Reverse 


552 


1 100 


183 


Forward 


47 


1078 


344 


Forward 


1239 


1586 


116 


Forward 


263 


958 


232 


Reverse 


1079 


1201 


41 


Reverse 


797 


1 105 


103 


Forward 


1 179 


1391 


71 


Forward 


68 


247 


60 


Reverse 


644 


1528 


295 


Reverse 


1 164 


1640 


159 


Reverse 


O A C 

845 


1660 


272 


Reverse 


687 


1055 


1 o o 

123 


Forward 


979 


1932 


318 


Forward 


317 


550 


78 


Reverse 


1 


I4l 


47 


Reverse 


1513 


1803 


97 


Reverse 


427 


1305 


293 


Forward 


57 


272 


72 


Forward 


77 


1474 


466 


Forward 


92 


550 


153 


Reverse 
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Assembly 


Start 


Stop 


Start 


Stop 


Length 


Direction 


ID 














3864854 


~CAC - 


CTA~ 


324 


548 


75 


- Reverse 


3864862 


~CAC 


CTA~ 


431 


1003 


191 


Reverse 


3864888 


~CAC 


TTA~ 


10 


657 


216 


Reverse 


3864898 


GTG 


TAA 


130 


1029 


300 


Forward 


3864938 


GTG 


TGA 


883 


1326 


148 


Forward 


3864956 


GTG 


TAA 


1030 


1251 


74 


Forward 


3864958 


~CAC 


TCA~ 


1427 


1711 


95 


Reverse 


3865022 


~CAC 


TCA~ 


279 


1271 


331 


Reverse 


3865036 


GTG 


TAG 


79 


492 


138 


Forward 


3865054 


~CAC 


TCA~ 


302 


793 


164 


Reverse 


3865102 


~CAC 


CTA~ 


27 


731 


235 


Reverse 


3865156 


~CAC 


TTA~ 


416 


808 


131 


Reverse 


3865160 


GTG 


TAA 


136 


375 


80 


Forward 


3865172 


~CAC 


TTA~ 


731 


1123 


131 


Reverse 


3865228 


GTG 


TAA 


197 


286 


30 


Forward 


3865230 


GTG 


TGA 


272 


586 


105 


Forward 


3865378 


~CAC 


TTA~ 


421 


807 


129 


Reverse 


3865470 


GTG 


TAG 


98 


742 


215 


Forward 


3865632 


GTG 


TAA 


46 


456 


137 


Forward 


3865710 


~CAC 


TCA~ 


287 


448 


54 


Reverse 



EXAMPLES 

The examples below are carried out using standard techniques, which are well known 
and routine to those of skill in the art, except where otherwise described in detail. The examples 
are illustrative, but do not limit the invention. 
Example 1 

Isolation of DNA coding for a virulence gene in Streptococcus pneumoniae 

As mentioned above each of the DNAs disclosed herein by virtue of the fact that it 
includes an intact open reading frame is useful to a greater or lesser extent as a screen for 
identifying antimicrobial compounds. A useful approach for selecting the preferred DNA 
sequences for screen development is evaluation by insertion-duplication mutagenesis. This 
system disclosed by Morrison et al., J. Bacteriol . 159:870 (1984), is applied as follows. 

Briefly, random fragments of Streptococcus pneumoniae, strain 0100993 DNA are 
generated enzymatically (by restriction endonuclease digestion) or physically (by sonication 
based shearing) followed by gel fractionation and end repair employing T4 DNA 
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polymerase. It is preferred that the DNA fragments so produced are in the range of 200-400 _ 
base pairs, a size sufficient to ensure homologous recombination and to insure a 
representative library in E.coli. The fragments are then inserted into appropriately tagged 
plasmids as described in Hensel et aL, Science 269: 400-403(1995). Although a number of 
plasmids can be used for this purpose, a particularly useful plasmid is pJDC9 described by 
Pearce et aL, Mol. Microbiol . 9:1037 (1993) which carries the erm gene facilitating 
erythromycin selection in either E. coli or S. pneumoniae previously modified by 
incorporation of DNA sequence tags into one of the polylinker cloning sites. The tagged 
plasmids are introduced into the appropriate S. pneumoniae strain selected, inter alia, on the 
basis of serotype and virulence in a murine model of pneumococcal pneumonia. 

It is appreciated that a seventeen amino acid competence factor exists (Havastein et 
aL, Proc. Natl Acad . Sci. USA 92:1 1 140-44 (1995)) and may be usefully employed in this 
protocol to increase the transformation frequencies. A proportion of transformants are 
analysed to verify homologous integration and as a check on stability. Unwanted levels of 
reversion are minimized because the duplicated regions will be short (200-400 bp), however 
if significant reversion rates are encountered they may be modulated by maintaining 
antibiotic selection during the growth of the transformants in culture and/or during growth 
in the animal. 

The S. pneumoniae transformants are pooled for inoculation into mice, eg., Swiss 
and/or C57B1/6. Preliminary experiments are conducted to establish the optimum 
complexity of the pools and level of inoculum. A particularly useful model has been 
described by Veber et aL ( J. Antimicrobiol. Chemother .32:432 (1993) in which 10 5 cfu 
inocula sizes are introduced by mouth to the trachea. Strain differences are observed with 
respect to onset of disease e.g. ,3-4 days for Swiss mice and 8-10 days for C57B1/6. 
Infection yields in the lungs approach 10 8 cfu/lung. IP administration is also possible when 
genes mediating blood stream infection are evaluated. Following optimization of 
parameters of the infection model, the mutant bank normally comprising several thousand 
strains is subjected to the virulence test. Mutants with attenuated virulence are identified by 
hybridization analysis using the labelled tags from the "input" and "recovered" pools as 
probes as described in Hensel et aL, Science 269: 400-403(1995). 5. pneumoniae DNA is 
colony blotted or dot blotted, DNA flanking the integrated plasmid is cloned by plasmid 
rescue in E. coli (Morrison et aL, J. Bacteriol . 159:870 (1984)) and sequenced. Following 
sequencing, the DNA is compared to the nucleotide sequences given herein and the 
appropriate ORF is identified and function confirmed for example by knock-out studies. 
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Expression vectors providing the selected protein are prepared and the protein is configured _ 
in an appropriate screen for the identification of anti-microbial agents. Alternatively, 
genomic DNA libraries are probed with restriction fragments flanking the integrated 
plasmid to isolate full-length cloned virulence genes whose function can be confirmed by 
"knock-out" studies or other methods, which are then expressed and incorporated into a 
screen as described above. 
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What is claimed is 1 . An isolated polynucleotide comprising a polynucleotide _ 
sequence selected from the group consisting of: 

(a) a polynucleotide having at least a 70% identity to a polynucleotide encoding a 
polypeptide comprising an amino acid sequence of Table 1 ; 

(b) a polynucleotide having at least a 70% identity to a polynucleotide encoding a 
mature polypeptide expressed by the gene contained in the S. pneumoniae of the deposited strain 
that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 70% identical to an amino acid sequence of Table 1 ; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide 
of(a),(b),(c)or(d). 

2. The polynucleotide of Claim 1 wherein the polynucleotide is DNA. 

3. The polynucleotide of Claim 1 wherein the polynucleotide is RNA. 

4. The polynucleotide of Claim 2 comprising the nucleic acid sequence selected 
from the group consisting of the nucleic acid sequences set forth in Table 1 . 

5. The polynucleotide of Claim 2 which encodes a polypeptide comprising an 
amino acid sequence sequence selected from the group consisting of the amino acid sequences 
set forth in Table 1. 

6. A vector comprising the polynucleotide of Claim 1 . 

7. A host cell comprising the vector of Claim 6. 

8. A process for producing a polypeptide comprising: expressing from the host 
cell of Claim 7 a polypeptide encoded by said DNA. 

9. A process for producing a polypeptide or fragment comprising culturing a 
host of claim 7 under conditions sufficient for the production of said polypeptide or 
fragment. 

10. A polypeptide comprising an amino acid sequence which is at least 70% 
identical to an amino acid sequence selected from the group consisting of the amino acid 
sequences set forth in Table 1 . 

11. A polypeptide comprising an amino acid sequence selected from the group 
consisting of the amino acid sequences set forth in Table 1 . 

12. An antibody against the polypeptide of claim 10. 



123 



WO 98/19689 



PCT7US97/19226 



13. An antagonist or agonist of the activity or expression of the polypeptide of _ 
claim 10. 

14. A method for the treatment or prevention of disease of an individual 
comprising: administering to the individual a therapeutically effective amount of the polypeptide 
of claim 10. 

15. A method for the treatment of an individual having need to inhibit a bacterial 
polypeptide comprising: administering to the individual a therapeutically effective amount of the 
antagonist of Claim 13. 

16. A process for diagnosing a disease related to expression or activity of the 
polypeptide of claim 10 in an individual comprising: 

(a) determining a nucleic acid sequence encoding said polypeptide, and/or 

(b) analyzing for the presence or amount of said polypeptide in a sample derived from 
the individual. 

17. A method for identifying compounds which interact with and inhibit or activate 
an activity of the polypeptide of claim 10 comprising: 

contacting a composition comprising the polypeptide with the compound to be screened 
under conditions to permit interaction between the compound and the polypeptide to assess the 
interaction of a compound, such interaction being associated with a second component capable of 
providing a detectable signal in response to the interaction of the polypeptide with the 
compound; 

and determining whether the compound interacts with and activates or inhibits an 
activity of the polypeptide by detecting the presence or absence of a signal generated from the 
interaction of the compound with the polypeptide. 

18. A method for inducing an immunological response in a mammal which 
comprises inoculating the mammal with the polypeptide of claim 10, or a fragment or variant 
thereof, adequate to produce antibody and/or T cell immune response to protect said animal 
from disease. 

19. A method of inducing immunological response in a mammal which comprises 
delivering a nucleic acid vector to direct expression of a polypeptide of claim 10, or fragment 
or a variant thereof, for expressing said polypeptide, or a fragment or a variant thereof in 
vivo in order to induce an immunological response to produce antibody and/ or T cell 
immune response to protect said animal from disease. 

20. A polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of the the first ten polynucleotides sequences from the top of Table 1. 
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21. A polypeptide comprising a polypeptide encoded by the polynculeotide of _ 
claim 20. 

22. The isolated polynucleotide of claim 1 wherein said nucleotide is selected from 
the group consisting of: 

(a) a polynucleotide having at least a 90% identity to a polynucleotide encoding a 
polypeptide comprising the amino acid sequence of Table 1 ; 

(b) a polynucleotide having at least a 90% identity to a polynucleotide encoding the 
same mature polypeptide expressed by the gene contained in the S. pneumoniae of the deposited 
strain that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 90% identical to the amino acid sequence of Table 1 ; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 1 5 sequential bases of the polynucleotide 
of (a), (b), (c) or(d). 

23. The isolated polynucleotide of claim 1 selected from the group consisting of: 

(a) a polynucleotide having at least a 95% identity to a polynucleotide encoding a 
polypeptide comprising the amino acid sequence of Table 1 ; 

(b) a polynucleotide having at least a 95% identity to a polynucleotide encoding the 
same mature polypeptide expressed by the gene contained in the S. pneumoniae of the deposited 
strain that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 95% identical to the amino acid sequence of Table 1 ; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide 
of (a), (b), (c) or(d). 

24. An isolated polynucleotide comprising a polynucleotide sequence selected from 
the group consisting of: 

(a) a polynucleotide having at least a 50% identity to a polynucleotide encoding a 
polypeptide comprising the amino acid sequence of Table 1 and obtained from a prokaryotic 
species other than S. pneumoniae; 
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(b) a polynucleotide encoding a polypeptide comprising an amino acid sequence _ 
which is at least 50% identical to the amino acid sequence of Table 1 and obtained from a 
prokaryotic species other than S. pneumoniae', and 

(c) a polynucleotide which is complementary to the polynucleotide of (a) or (b). 

25. An isolated Streptococcal polypeptide having one of the amino acid 
sequences given in Table 1. 

26. An isolated nucleic acid encoding one of the amino acid sequences of 
Claim 1 and nucleic acid sequences capable of hybridizing therewith under stringent 
conditions. 

27. Recombinant vectors comprising the nucleic acid sequences of 
Claim 26 and host cells transformed or transfected therewith. 

28. A method of identifying an antimicrobial compound comprising contacting 
candidate compounds with a polypeptide of Claim 1 and selecting those compounds capable 
of inhibiting the bioactivity of said polypeptide. 

29. Antimicrobial compounds identified by the method of Claim 28. 

30. An isolated Streptococcal polypeptide having one of the amino acid 
sequences given in Table 1 . 

31. An isolated nucleic acid encoding one of the amino acid sequences of 
Claim 30 and nucleic acid sequences capable of hybridizing therewith under stringent 
conditions. 

32. Recombinant vectors comprising the nucleic acid sequences of 
Claim 31 and host cells transformed or transfected therewith. 

33. A method of identifying an antimicrobial compound comprising contacting 
candidate compounds with a polypeptide of Claim 30 and selecting those compounds 
capable of inhibiting the bioactivity of said polypeptide. 

34. Antimicrobial compounds identified by the method of Claim 33. 
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NOVEL CODING SEQUENCES 
FIELD OF THE INVENTION 

This invention relates to newly identified polynucleotides and polypeptides, and their 
production and uses, as well as their variants, agonists and antagonists, and their uses. In 
particular, in these and in other regards, the invention relates to novel polynucleotides and 
polypeptides set forth in Table 1 . 
BACKGROUND OF THE INVENTION 

The Streptococci make up a medically important genera of microbes known to 
cause several types of disease in humans, including otitis media, pneumonia and 
meningitis. Since its isolation more than 100 years ago, Streptococcus pneumoniae (herein 
S. pneumoniae) has been one of the more intensively studied microbes. For example, much 
of our early understanding that DNA is, in fact, the genetic material was predicated on the 
work of Griffith and of Avery, Macleod and McCarty using this microbe. Despite the vast 
amount of research with S. pneumoniae, many questions concerning the virulence of this 
microbe remain. 

While certain Streptococcal factors associated with pathogenicity have been 
identified, e.g., capsule polysaccharides, peptidoglycans, pneumolysins, PspA Complement 
factor H binding component, autolysin, neuraminidase, peptide permeases, hydrogen 
peroxide, IgAl protease, the list is certainly not complete. Further very little is known 
concerning the temporal expression of such genes during infection and disease progression 
in a mammalian host. Discovering the sets of genes the bacterium is likely to be expressing 
at the different stages of infection, particularly when an infection is established, provides 
critical information for the screening and characterization of novel antibacterials which can 
interrupt pathogenesis. In addition to providing a fuller understanding of known proteins, 
such an approach will identify previously unrecognised targets. 

GUG is used as an initating nucleotide, rather than ATG, for a significant number 
of mRNA's in both Gram positive and Gram negative bacteria. Statistics on the frequency 
of NTG codons in the start codon for several bacterial species are available on line via 
computer at http://biochem.otago.ac.nz:800/Transterm/home_page.html). 

A discussion of initiation codons in B. subtilis is set forth in Vellanoweth, RL.1993 
in Bacillus subtilis and other Gram Positive Bacteria. Biochemistry. Physiology and 
Molecular Genetic s. Sonenshein, Hoch, Losick Eds. Amer. Soc. Microbiol, Washington 
DC. p. 699-711. Vellenworth indicates a major difference between B. subtilis and the 
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gram-negative organisms is in the choice of initiation codon. 91% of the sequenced E. coli 
genes start with AUG. By contrast, about 30% of B. subtilis and other clostridial branch 
gened start with UUG or GUG. Moreover, CUG functions as a start codon in B. subtilis. 
Mutations of an AUG initiation codon to GUG or UUG often cause decreased expression in 
B. subtilis and E. coli. Generally, translation efficiency is higher with AUG initiation 
codons. A strong Shine-Delgarno ribosome binding site, however, can compensate almost 
fully for a weak initiation codon. It has been reported that genes with a range of expression 
levels have initiation codons other than ATG in gram positives (Vellanoweth, RL.1993 in 
Bacillus subtilis and other Gram Positive Bacteria> Biochemistry. Physiology and 
Molecular Genetic s. Sonenshein, Hoch, Losick Eds. Amer. Soc. Microbiol, Washington 
DC. p. 699-711). 

Provided herein are ORF sequences from genes possessing GUG initiation codons 
and proteins expressed therefrom and homologues thereto to be used for screening for 
antimicrobial compounds. Clearly, there is a need for polypeptide and polynucleotide 
sequences that may be used to screen for antimicrobial compound and which may also be 
used to determine the roles of such sequences in pathogenesis of infection, dysfunction and 
disease. There is also need, therefore, for identification and characterization of such 
sequences which may play a role in preventing, ameliorating or correcting infections, 
dysfunctions or diseases. 

The polypeptides of the invention have amino acid sequence homology to a known 
protein(s) as set forth in Table 1 . 
SUMMARY OF THE INVENTION 

It is an object of the invention to provide polypeptides that have been identified as 
novel polypeptides by homology between an amino acid sequence selected from the group 
consisting of the sequences set out in Table 1 and a known amino acid sequence or sequences 
of other proteins such as the protein identities listed in Table 1 . 

It is a further object of the invention to provide polynucleotides that encode novel 
polypeptides, particularly polynucleotides that encode polypeptides of Streptococcus 
pneumoniae. 

In a particularly preferred embodiment of the invention the polynucleotide comprises 
a region encoding a polypeptide comprising a sequence sequence selected from the group 
consisting of the sequences set out in Table 1, or a variant of any of these sequences. 

2 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



In another particularly preferred embodiment of the invention there is a novel 
protein from Streptococcus pneumoniae comprising an amino acid sequence selected from 
the group consisting of the sequences set out in Table 1, or a variant of any of these 
sequences. 

In accordance with another aspect of the invention there is provided an isolated 
nucleic acid molecule encoding a mature polypeptide expressible by the Streptococcus 
pneumoniae 0100993 strain contained in the deposited strain. 

A further aspect of the invention there are provided isolated nucleic acid molecules 
encoding a polypeptide of the invention, particularly Streptococcus pneumoniae polypeptide, 
and including mRNAs, cDNAs, genomic DNAs. Further embodiments of the invention 
include biologically, diagnostically, prophylactically, clinically or therapeutically useful 
variants thereof, and compositions comprising the same. 

In accordance with another aspect of the invention, there is provided the use of a 
polynucleotide of the invention for therapeutic or prophylactic purposes, in particular 
genetic immunization. Among the particularly preferred embodiments of the invention are 
naturally occurring allelic variants of a polypeptide of the invention and polypeptides 
encoded thereby. 

Another aspect of the invention there are provided novel polypeptides of 
Streptococcus pneumoniae as well as biologically, diagnostically, prophylactically, clinically 
or therapeutically useful variants thereof, and compositions comprising the same. 

Among the particularly preferred embodiments of the invention are variants of the 
polypeptides of the invention encoded by naturally occurring alleles of their genes. 

In a preferred embodiment of the invention there are provided methods for producing 
the aforementioned polypeptides. 

In accordance with yet another aspect of the invention, there are provided inhibitors 
to such polypeptides, useful as antibacterial agents, including, for example, antibodies. 

In accordance with certain preferred embodiments of the invention, there are provided 
products, compositions and methods for assessing expression of the polypeptides and 
polynucleotides of the invention, treating disease, for example, including, for example, otitis 
media, conjunctivitis, pneumonia, bacteremia, meningitis, sinusitis, pleural empyema and 
endocarditis, and most particularly meningitis, such as for example infection of cerebrospinal 
fluid, assaying genetic variation, and administering a polypeptide or polynucleotide of the 
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invention to an organism to raise an immunological response against a bacteria, especially a 
Streptococcus pneumoniae bacteria. 

In accordance with certain preferred embodiments of this and other aspects of the 
invention there are provided polynucleotides that hybridize to a polynucleotide sequence of 
the invention, particularly under stringent conditions. 

In certain preferred embodiments of the invention there are provided antibodies 
against polypeptides of the invention. 

In other embodiments of the invention there are provided methods for identifying 
compounds which bind to or otherwise interact with and inhibit or activate an activity of a 
polypeptide or polynucleotide of the invention comprising: contacting a polypeptide or 
polynucleotide of the invention with a compound to be screened under conditions to permit 
binding to or other interaction between the compound and the polypeptide or polynucleotide 
to assess the binding to or other interaction with the compound, such binding or interaction 
being associated with a second component capable of providing a detectable signal in 
response to the binding or interaction of the polypeptide or polynucleotide with the 
compound; and determining whether the compound binds to or otherwise interacts with and 
activates or inhibits an activity of the polypeptide or polynucleotide by detecting the presence 
or absence of a signal generated from the binding or interaction of the compound with the 
polypeptide or polynucleotide. 

In accordance with yet another aspect of the invention, there are provided agonists 
and antagonists of the polypeptides and polynucleotides of the invention, preferably 
bacteriostatic or bacteriocidal agonists and antagonists. 

In a further aspect of the invention there are provided compositions comprising a 
polynucleotide or a polypeptide of the invention for administration to a cell or to a 
multicellular organism. 

Various changes and modifications within the spirit and scope of the disclosed 
invention will become readily apparent to those skilled in the art from reading the following 
descriptions and from reading the other parts of the present disclosure. 
GLOSSARY 

The following definitions are provided to facilitate understanding of certain terms 
used frequently herein. 
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"Disease(s) means any bacterial infection, but preferably a streptococcal infection, 
such as, otitis media, conjunctivitis, pneumonia, bacteremia, meningitis, sinusitis, pleural 
empyema, endocarditis, meningitis, and infection of cerebrospinal fluid. 

"Host ceir is a cell which has been transformed or transfected, or is capable of 
transformation or transfection by an exogenous polynucleotide sequence. 

"Identity," as known in the art, is a relationship between two or more polypeptide 
sequences or two or more polynucleotide sequences, as determined by comparing the 
sequences. In the art, "identity" also means the degree of sequence relatedness between 
polypeptide or polynucleotide sequences, as the case may be, as determined by the match 
between strings of such sequences. "Identity" and "similarity" can be readily calculated by 
known methods, including but not limited to those described in {Computational Molecular 
Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: 
Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; 
Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana 
Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., 
Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., 
M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied 
Math,, 48: 1073 (1988). Preferred methods to determine identity are designed to give the 
largest match between the sequences tested. Methods to determine identity and similarity 
are codified in publicly available computer programs. Preferred computer program 
methods to determine identity and similarity between two sequences include, but are not 
limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research 12(1): 
387 (1984)), BLASTP, BLASTN, and FASTA (Atschul, S.F. et al., J. Molec. Biol. 215: 
403-410 (1990). The BLAST X program is publicly available from NCBI and other 
sources {BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, MD 20894; 
Altschul, S., et al, J. Mol Biol. 215: 403-410 (1990). As an illustration, by a 
polynucleotide having a nucleotide sequence having at least, for example, 95% "identity" to 
a reference nucleotide sequence it is intended that the nucleotide sequence of the tested 
polynucleotide is identical to the reference sequence except that the polynucleotide 
sequence may include up to five point mutations per each 100 nucleotides of the reference 
nucleotide sequence. In other words, to obtain a polynucleotide having a nucleotide 
sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the 
nucleotides in the reference sequence may be deleted or substituted with another 
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nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference 
sequence may be inserted into the reference sequence. These mutations of the reference 
sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence 
or anywhere between those terminal positions, interspersed either individually among 
nucleotides in the reference sequence or in one or more contiguous groups within the 
reference sequence. Analogously , by a polypeptide having an amino acid sequence having 
at least, for example, 95% identity to a reference amino acid sequence is intended that the 
test amino acid sequence of the polypeptide is identical to the reference sequence except 
that the polypeptide sequence may include up to five amino acid alterations per each 100 
amino acids of the reference amino acid. In other words, to obtain a polypeptide having an 
amino acid sequence at least 95% identical to a reference amino acid sequence, up to 5% of 
the amino acid residues in the reference sequence may be deleted or substituted with 
another amino acid, or a number of amino acids up to 5% of the total amino acid residues in 
the reference sequence may be inserted into the reference sequence. These alterations of 
the reference sequence may occur at the amino or carboxy terminal positions of the 
reference amino acid sequence or anywhere between those terminal positions, interspersed 
either individually among residues in the reference sequence or in one or more contiguous 
groups within the reference sequence. 

"Isolated" means altered "by the hand of man" from its natural state, if it occurs 
in nature, it has been changed or removed from its original environment, or both. For 
example, a polynucleotide or a polypeptide naturally present in a living organism is not 
"isolated," but the same polynucleotide or polypeptide separated from the coexisting materials 
of its natural state is "isolated", as the term is employed herein. 

"Polynucleotide(s)" generally refers to any polyribonucleotide or 
polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. 
"Polynucleotide(s)" include, without limitation, single- and double-stranded DNA, DNA that 
is a mixture of single- and double-stranded regions or single-, double- and triple-stranded 
regions, single- and double-stranded RNA, and RNA that is mixture of single- and double- 
stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, 
more typically, double-stranded, or triple-stranded regions, or a mixture of single- and double- 
stranded regions. In addition, "polynucleotide" as used herein refers to triple-stranded regions 
comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from 
the same molecule or from different molecules. The regions may include all of one or more 
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of the molecules, but more typically involve only a region of some of the molecules. One of 
the molecules of a triple-helical region often is an oligonucleotide. As used herein, the term 
"polynucleotide(s)" also includes DNAs or RNAs as described above that contain one or more 
modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other 
reasons are "polynucleotide(s)" as that term is intended herein. Moreover, DNAs or RNAs 
comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name 
just two examples, are polynucleotides as the term is used herein. It will be appreciated that a 
great variety of modifications have been made to DNA and RNA that serve many useful 
purposes known to those of skill in the art. The term "polynucleotide^)" as it is employed 
herein embraces such chemically, enzymatically or metabolically modified forms of 
polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and 
cells, including, for example, simple and complex cells. "Polynucleotide(s)" also embraces 
short polynucleotides often referred to as oligonucleotide(s). 

"Polypeptide(s)" refers to any peptide or protein comprising two or more amino acids 
joined to each other by peptide bonds or modified peptide bonds. "Polypeptide(s)" refers to 
both short chains, commonly referred to as peptides, oligopeptides and oligomers and to 
longer chains generally referred to as proteins. Polypeptides may contain amino acids other 
than the 20 gene encoded amino acids. M Polypeptide(s)" include those modified either by 
natural processes, such as processing and other post-translational modifications, but also by 
chemical modification techniques. Such modifications are well described in basic texts and in 
more detailed monographs, as well as in a voluminous research literature, and they are well 
known to those of skill in the art. It will be appreciated that the same type of modification 
may be present in the same or varying degree at several sites in a given polypeptide. Also, a 
given polypeptide may contain many types of modifications. Modifications can occur 
anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains, and 
the amino or carboxyl termini. Modifications include, for example, acetylation, acylation, 
ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme 
moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a 
lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, 
cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, 
formation of cysteine, formation of pyroglutamate, formylation, gamma-carboxylation, 
glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, 
oxidation, proteolytic processing, phosphorylation, prenylation, racemization, glycosylation, 
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lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation 
and ADP-ribosylation, selenoylation, sulfation, transfer-RNA mediated addition of amino 
acids to proteins, such as arginylation, and ubiquitination. See, for instance, PROTEINS - 
STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed., T. E. Creighton, W. H. Freeman 
and Company, New York (1993) and Wold, F., Posttranslational Protein Modifications: 
Perspectives and Prospects, pgs. 1-12 in POSTTRANSLATIONAL COVALENT 
MODIFICATION OF PROTEINS, B. C. Johnson, Ed., Academic Press, New York (1983); 
Seifter et aL, Meth. Enzymol. 782:626-646 (1990) and Rattan et aL, Protein Synthesis: 
Posttranslational Modifications and Aging, Ann. N.Y. Acad. Sci. 663: 48-62 (1992). 
Polypeptides may be branched or cyclic, with or without branching. Cyclic, branched and 
branched circular polypeptides may result from post-translational natural processes and may 
be made by entirely synthetic methods, as well. 

"Variant(s)" as the term is used herein, is a polynucleotide or polypeptide that 
differs from a reference polynucleotide or polypeptide respectively, but retains essential 
properties. A typical variant of a polynucleotide differs in nucleotide sequence from 
another, reference polynucleotide. Changes in the nucleotide sequence of the variant may 
or may not alter the amino acid sequence of a polypeptide encoded by the reference 
polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, 
deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as 
discussed below. A typical variant of a polypeptide differs in amino acid sequence from 
another, reference polypeptide. Generally, differences are limited so that the sequences of 
the reference polypeptide and the variant are closely similar overall and, in many regions, 
identical. A variant and reference polypeptide may differ in amino acid sequence by one or 
more substitutions, additions, deletions in any combination. A substituted or inserted 
amino acid residue may or may not be one encoded by the genetic code. A variant of a 
polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it 
may be a variant that is not known to occur naturally. Non-naturally occurring variants of 
polynucleotides and polypeptides may be made by mutagenesis techniques, by direct 
synthesis, and by other recombinant methods known to skilled artisans. 
DESCRIPTION OF THE INVENTION 

Each of polynucleotide and polypeptide sequences provided herein may be used in 
the discovery and development of antibacterial compounds. Upon expression of the 
sequences with the appropriate initiation and termination codons the encoded polypeptide 
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can be used as a target for the screening of antimicrobial drugs. Additionally, the DNA 
sequences encoding preferably the amino terminal regions of the encoded protein or the 
Shine-Delgarno region can be used to construct antisense sequences to control the 
expression of the coding sequence of interest. Furthermore, many of the sequences 
disclosed herein also provide regions upstream and downstream from the encoding 
sequence. These sequences are useful as a source of regulatory elements for the control of 
bacterial gene expression. Such sequences are conveniently isolated by restriction enzyme 
action or synthesized chemically and introduced, for example, into promoter identification 
strains. These strains contain a reporter structural gene sequence located downstream from 
a restriction site such that if an active promoter is inserted, the reporter gene will be 
expressed. 

Although each of the sequences may be employed as described above, this 
invention also provides several means for identifying particularly useful target genes. The 
first of these approaches entails searching appropriate databases for sequence matches in 
related organisms. Thus, if a homologue exists, the Streptococcal-like form of this gene 
would likely play an analogous role. For example, a Streptococcal protein identified as 
homologous to a cell surface protein in another organism would be useful as a vaccine 
candidate. To the extent such homologies have been identified for the sequences disclosed 
herein they are reported along with the encoding sequence. 

Each of the DNA sequences provided herein may be used in the discovery and 
development of antibacterial compounds. Because each of the sequences contains an open 
reading frame (ORF) with an appropriate initiation and termination codons, the encoded 
protein upon expression can be used as a target for the screening of antimicrobial drugs. 
Additionally, the DNA sequences encoding the amino terminal regions of the encoded 
protein can be used to construct antisense sequences to control the expression of the coding 
sequence of interest. Furthermore, many of the sequences disclosed herein also provide 
regions upstream and downstream from the encoding sequence. These sequences are useful 
as a source of regulatory elements for the control of bacterial gene expression. Such 
sequences are conveniently isolated by restriction enzyme action or synthesized chemically 
and introduced, for example, into promoter identification strains. These strains contain a 
reporter structural gene sequence located downstream from a restriction site such that if an 
active promoter is inserted, the reporter gene will be expressed. 
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It is believed that bacteria possess a number of ways of regulating gene expression 
levels, especially in subtle degrees, and the interplay between ribosome binding site and 
inititation codon is utilized for this purpose for these genes. It is also believed that such 
genes will be important targets for antimicrobial drug discovery, particularly since 
pathogenesis genes are believed undergo gene expression regulation during in the 
pathogenesis process. Therefore, the invention provides ORF sequences possessing a GTG 
(GUG ) initiation codon and protein targets expressed thereform. 

Although each of the sequences may be employed as described above, this 
invention also provides several means for identifying particularly useful target genes. The 
first of these approaches entails searching appropriate databases for sequence matches in 
related organisms. Thus, if a homologue exists, the Streptococcal-like form of this gene 
would likely play an analogous role. For example, a Streptococcal protein identified as 
homologous to a cell surface protein in another organism would be useful as a vaccine 
candidate. To the extent such homologies have been identified for the sequences disclosed 
herein they are reported along with the encoding sequence. 

ORF Gene Expression 

Recently techniques have become available to evaluate temporal gene expression in 
bacteria, particularly as it applies to viability under laboratory and infection conditions. A 
number of methods can be used to identify genes which are essential to survival per se, or 
essential to the establishment/maintenance of an infection. Identification of an ORF 
unknown by one of these methods yields additional information about its function and 
permits the selection of such an ORF for further development as a screening target. 
Briefly, these approaches include: 

1) Signature Tagged Mutagenesis (STM): This technique is described by Hensel 
et aL, Science 269: 400-403(1995), the contents of which is incorporated by reference for 
background purposes. Signature tagged mutagenesis identifies genes necessary for the 
establishment/maintenance of infection in a given infection model. 

The basis of the technique is the random mutagenesis of target organism by various 
means (e.g., transposons) such that unique DNA sequence tags are inserted in close 
proximity to the site of mutation. The tags from a mixed population of bacterial mutants 
and bacteria recovered from an infected hosts are detected by amplification, radiolabeling 
and hybridisation analysis. Mutants attenuated in virulence are revealed by absence of the 
tag from the pool of bacteria recovered from infected hosts. 
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In Streptococcus pneumoniae, because the transposon system is less well 
developed, a more efficient way of creating the tagged mutants is to use the insertion- 
duplication mutagenesis technique as described by Morrison et aL, L Bacteriol. 159:870 
(1984) the contents of which is incorporated by reference for background purposes. 

2) In Vivo Expression Technology (IVET): This technique is described by 
Camilli et aL, Proc. Natl. Acad . Sri. USA . 91:2634-2638 (1994), the contents of which is 
incorporated by reference for background purposes. IVET identifies genes up-regulated 
during infection when compared to laboratory cultivation, implying an important role in 
infection. ORF identified by this technique are implied to have a significant role in 
infection establishment/maintenance. 

In this technique random chromosomal fragments of target organism are cloned 
upstream of a promoter-less recombinase gene in a plasmid vector. This construct is 
introduced into the target organism which carries an antibiotic resistance gene flanked by 
resolvase sites. Growth in the presence of the antibiotic removes from the population those 
fragments cloned into the plasmid vector capable of supporting transcription of the 
recombinase gene and therefore have caused loss of antibiotic resistance. The resistant 
pool is introduced into a host and at various times after infection bacteria may be recovered 
and assessed for the presence of antibiotic resistance. The chromosomal fragment carried 
by each antibiotic sensitive bacterium should carry a promoter or portion of a gene 
normally upregulated during infection. Sequencing upstream of the recombinase gene 
allows identification of the up regulated gene. 

3) Differential display: This technique is described by Chuang et aL, L 
Bacteriol. 175:2026-2036 (1993), the contents of which is incorporated by reference for 
background purposes. This method identifies those genes which are expressed in an 
organism by identifying mRNA present using randomly-primed RT-PCR. By comparing 
pre-infection and post infection profiles, genes up and down regulated during infection can 
be identified and the RT-PCR product sequenced and matched to ORF 'unknowns'. 
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4) Generation of conditional lethal mutants by transposon mutagenesis: 

This technique, described by de Lorenzo, V. et ah, Gene 123:17-24 (1993); Neuwald, 
A. F. et al., Gene 125: 69-73(1993); and Takiff, H. E. et ah, J- Bacteriol . 174:1544- 
1553(1992), the contents of which is incorporated by reference for background 
purposes, identifies genes whose expression are essential for cell viability. 

In this technique transposons carrying controllable promoters, which provide 
transcription outward from the transposon in one or both directions, are generated. 
Random insertion of these transposons into target organisms and subsequent isolation of 
insertion mutants in the presence of inducer of promoter activity ensures that insertions 
which separate promoter from coding region of a gene whose expression is essential for cell 
viability will be recovered. Subsequent replica plating in the absence of inducer identifies 
such insertions, since they fail to survive. Sequencing of the flanking regions of the 
transposon allows identification of site of insertion and identification of the gene disrupted. 
Close monitoring of the changes in cellular processes/morphology during growth in the 
absence of inducer yields information on likely function of the gene. Such monitoring 
could include flow cytometry (cell division, lysis, redox potential, DNA replication), 
incorporation of radiochemical^ labeled precursors into DNA, RNA, protein, lipid, 
peptidoglycan, monitoring reporter enzyme gene fusions which respond to known cellular 
stresses. 

5) Generation of conditional lethal mutants by chemical mutagenesis: This 
technique is described by Beck with, J.. Methods in Enzvmology 204: 

3-18(1991), the contents of which are incorporated herein by reference for background 
purposes. In this technique random chemical mutagenesis of target organism, growth at 
temperature other than physiological temperature (permissive temperature) and subsequent 
replica plating and growth at different temperature (e.g. 42°C to identify ts, 25°C to identify 
cs) are used to identify those isolates which now fail to grow (conditional mutants). As 
above close monitoring of the changes upon growth at the non-permissive temperature 
yields information on the function of the mutated gene. Complementation of conditional 
lethal mutation by library from target organism and sequencing of complementing gene 
allows matching with unknown ORF. 

6) RT-PCR: Streptococcus pneumoniae messenger RNA is isolated from bacterial 
infected tissue e.g. 48 hour murine lung infections, and the amount of each mRNA species 
assessed by reverse transcription of the RNA sample primed with random hexanucleotides 

12 



substitute sheet (rule 2$ 



WO 98/19689 



PCTYUS97/19226 



followed by PCR with gene specific primer pairs. The determination of the presence and 
amount of a particular mRNA species by quantification of the resultant PCR product 
provides information on the bacterial genes which are transcribed in the infected tissue. 
Analysis of gene transcription can be carried out at different times of infection to gain a 
detailed knowledge of gene regulation in bacterial pathogenesis allowing for a clearer 
understanding of which gene products represent targets for screens for novel 
antibacterials. Because of the gene specific nature of the PCR primers employed it should 
be understood that the bacterial mRNA preparation need not be free of mammalian RNA. 
This allows the investigator to carry out a simple and quick RNA preparation from 
infected tissue to obtain bacterial mRNA species which are very short lived in the 
bacterium (in the order of 2 minute halflives). Optimally the bacterial mRNA is prepared 
from infected murine lung tissue by mechanical disruption in the presence of TRIzole 
(GIBCO-BRL) for very short periods of time, subsequent processing according to the 
manufacturers of TRIzole reagent and DNAase treatment to remove contaminating DNA. 
Preferably the process is optimised by finding those conditions which give a maximum 
amount of Streptococcus pneumoniae 16S ribosomal RNA as detected by probing 
Northerns with a suitably labelled sequence specific oligonucleotide probe. Typically a 5' 
dye labelled primer is used in each PCR primer pair in a PCR reaction which is terminated 
optimally between 8 and 25 cycles. The PCR products are separated on 6% 
polyacrylamide gels with detection and quantification using GeneScanner (manufactured 
by ABI). 

Each of these techniques may have advantages or disadvantage depending on the 
particular application. The skilled artisan would choose the approach that is the most 
relevant with the particular end use in mind. 

Use of the of these technologies when applied to the ORFs of the present invention 
enables identification of bacterial proteins expressed during infection, inhibitors of which 
would have utility in anti-bacterial therapy. 

The invention relates to novel polypeptides and polynucleotides as described in 
greater detail below. In particular, the invention relates to polypeptides and polynucleotides 
of Streptococcus pneumoniae, which is related by amino acid sequence homology to known 
polypeptide as set forth in Table 1. The invention relates especially to compounds having the 
nucleotide and amino acid sequence selected from the group consisting of the sequences set 
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out in Table 1, and to the nucleotide sequences of the DNA in the deposited strain and amino 
acid sequences encoded thereby. 
Deposited materials 

The deposit has been made under the terms of the Budapest Treaty on the 
International Recognition of the Deposit of Micro-organisms for Purposes of Patent 
Procedure. The strain will be irrevocably and without restriction or condition released to the 
public upon the issuance of a patent. The deposit is provided merely as convenience to those 
of skill in the art and is not an admission that a deposit is required for enablement, such as that 
required under 35 U.S.C §112. 

A deposit containing a Streptococcus pneumoniae bacterial strain has been deposited 
with the National Collections of Industrial and Marine Bacteria Ltd. (NCIMB), 23 St. 
Machar Drive, Aberdeen AB2 1RY, Scotland on 11 April 1996 and assigned NCIMB 
Deposit No. 40794. The Streptococcus pneumoniae bacterial strain deposit is referred to 
herein as "the deposited bacterial strain" or as "the DNA of the deposited bacterial strain." 

The deposited material is a bacterial strain that contains the full length FabH DNA, 
referred to as "NCIMB 40794" upon deposit. 

The sequence of the polynucleotides contained in the deposited material, as well as 
the amino acid sequence of the polypeptide encoded thereby, are controlling in the event of 
any conflict with any description of sequences herein. 

A license may be required to make, use or sell the deposited materials, and no such 
license is hereby granted. 

The deposited strain contains the full length genes comprising the polynucleotides set 
forth in Table 1. The sequence of the polynucleotides contained in the deposited strain, as 
well as the amino acid sequence of the polypeptide encoded thereby, are controlling in the 
event of any conflict with any description of sequences herein. 

Polypeptides 

The polypeptides of the invention include the polypeptides set forth in Table 1 (in 
particular the mature polypeptide) as well as polypeptides and fragments, particularly those 
which have the biological activity of a polypeptide of the invention, and also those which have 
at least 50%, 60% or 70% identity to a polypeptide sequence selected from the group 
consisting of the sequences set out in Table 1 or the relevant portion, preferably at least 80% 
identity to a polypeptide sequence selected from the group consisting of the sequences set out 
in Table 1, and more preferably at least 90% similarity (more preferably at least 90% identity) 
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to a polypeptide sequence selected from the group consisting of the sequences set out in Table 
1, and still more preferably at least 95% similarity (still more preferably at least 95% identity) 
to a polypeptide sequence selected from the group consisting of the sequences set out in Table 
1, and also include portions of such polypeptides with such portion of the polypeptide 
generally containing at least 30 amino acids and more preferably at least 50 amino acids. 
The invention also includes polypeptides of the formula: 

X-(Ri) n -(R 2 )-(R3)n-Y 
wherein, at the amino terminus, X is hydrogen, and at the carboxyl terminus, Y is hydrogen or 
a metal, Rj and R3 are any amino acid residue, n is an integer between 1 and 2000, and R2 is 
an amino acid sequence of the invention, particularly an amino acid sequence selected from 
the group set forth in Table 1. In the formula above R2 is oriented so that its amino terminal 
residue is at the left, bound to R\ and its carboxy terminal residue is at the right, bound to R3. 
Any stretch of amino acid residues denoted by either R group, where R is greater than 1, may 
be either a heteropolymer or a homopolymer, preferably a heteropolymer. In preferred 
embodiments n is an integer between 1 and 1000 or 2000. 

A fragment is a variant polypeptide having an amino acid sequence that entirely is the 
same as part but not all of the amino acid sequence of the aforementioned polypeptides. As 
with polypeptides, fragments may be "free-standing," or comprised within a larger 
polypeptide of which they form a part or region, most preferably as a single continuous 
region, a single larger polypeptide. 

Preferred fragments include, for example, truncation polypeptides having a portion of 
the amino acid sequence of Table 1 , or of variants thereof, such as a continuous series of 
residues that includes the amino terminus, or a continuous series of residues that includes the 
carboxyl terminus. Degradation forms of the polypeptides of the invention in a host cell, 
particularly a Streptococcus pneumoniae, are also preferred. Further preferred are fragments 
characterized by structural or functional attributes such as fragments that comprise alpha-helix 
and alpha-helix forming regions, beta-sheet and beta-sheet-forming regions, turn and turn- 
forming regions, coil and coil-forming regions, hydrophilic regions, hydrophobic regions, 
alpha amphipathic regions, beta amphipathic regions, flexible regions, surface-forming 
regions, substrate binding region, and high antigenic index regions. 

Also preferred are biologically active fragments which are those fragments that 
mediate activities of polypeptides of the invention, including those with a similar activity or 
an improved activity, or with a decreased undesirable activity. Also included are those 
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fragments that are antigenic or immunogenic in an animal, especially in a human. Particularly 
preferred are fragments comprising receptors or domains of enzymes that confer a function 
essential for viability of Streptococcus pneumoniae or the ability to initiate, or maintain cause 
disease in an individual, particularly a human. 

Variants that are fragments of the polypeptides of the invention may be employed for 
producing the corresponding full-length polypeptide by peptide synthesis; therefore, these 
variants may be employed as intermediates for producing the full-length polypeptides of the 
invention. 

In addition to the standard single and triple letter representations for amino acids, 
the term "X" or M Xaa" is also used. "X" and "Xaa" mean that any of the twenty naturally 
occuring amino acids may appear at such a designated position in the polypeptide sequence. 

Polynucleotides 

The nucleotide sequences disclosed herein can be obtained by synthetic chemical 
techniques known in the art or can be obtained from S. pneumoniae 0100993 by probing a 
DNA preparation with probes constructed from the particular sequences disclosed herein. 
Alternatively, oligonucleotides derived from a disclosed sequence can act as PCR primers 
in a process of PCR-based cloning of the sequence from a bacterial genomic source. It is 
recognised that such sequences will also have utility in diagnosis of the stage of infection 
and type of infection the pathogen has attained. 

To obtain the polynucleotide encoding the protein using the DNA sequence given 
herein typically a library of clones of chromosomal DNA of S.pneumoniae 0100993 in E. 
coli or some other suitable host is probed with a radiolabeled oligonucleotide, preferably a 
17mer or longer, derived from the partial sequence. Clones carrying DNA identical to that 
of the probe can then be distinguished using high stringency washes. By sequencing the 
individual clones thus identified with sequencing primers designed from the original 
sequence it is then possible to extend the sequence in both directions to determine the full 
gene sequence. Conveniently such sequencing is performed using denatured double 
stranded DNA prepared from a plasmid clone. Suitable techniques are described by 
Maniatis, T., Fritsch, E.F. and Sambrook, J. in MOLECULAR CLONING, A Laboratory 
Manual, 2nd edition, 1989, Cold Spring Harbor Laboratory (see: Screening By 
Hybridization 1.90 and Sequencing Denatured Double-Stranded DNA Templates 13.70). 

Moerover, another aspect of the invention relates to isolated polynucleotides that 
encode the polypeptides of the invention having a deduced amino acid sequence selected from 
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the group consisting of the sequences in Table 1 and polynucleotides closely related thereto 
and variants thereof. 

Using the information provided herein, such as the polynucleotide sequences set out 
in Table 1, a polynucleotide of the invention encoding polypeptide may be obtained using 
standard cloning and screening methods, such as those for cloning and sequencing 
chromosomal DNA fragments from bacteria using Streptococcus pneumoniae 0100993 cells 
as starting material, followed by obtaining a full length clone. For example, to obtain a 
polynucleotide sequence of the invention, such as a sequence set forth in Table 1, typically 
a library of clones of chromosomal DNA of Streptococcus pneumoniae 0100993 in E.coli 
or some other suitable host is probed with a radiolabeled oligonucleotide, preferably a 17- 
mer or longer, derived from a partial sequence. Clones carrying DNA identical to that of 
the probe can then be distinguished using stringent conditions. By sequencing the 
individual clones thus identified with sequencing primers designed from the original 
sequence it is then possible to extend the sequence in both directions to determine the full 
gene sequence. Conveniently, such sequencing is performed using denatured double 
stranded DNA prepared from a plasmid clone. Suitable techniques are described by 
Maniatis, T., Fritsch, E.F. and Sambrook et al., MOLECULAR CLONING, A LABORATORY 
MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York 
(1989). (see in particular Screening By Hybridization 1.90 and Sequencing Denatured 
Double-Stranded DNA Templates 13.70). Illustrative of the invention, the polynucleotides 
set out in Table 1 were discovered in a DNA library derived from Streptococcus pneumoniae 
0100993. 

The DNA sequences set out in Table 1 each contains at least one open reading frame 
encoding a protein having at least about the number of amino acid residues set forth in Table 
1. The start and stop codons of each open reading frame (herein "ORF") DNA are the first 
three and the last three nuclotides of each polynucleotide set forth in Table 1 . 

Certain polynucleotides and polypeptides of the invention are structurally related to 
known proteins as set forth in Table 1. These proteins exhibit greatest homology to the 
homologue listed in Table 1 from among the known proteins. 

The invention provides a polynucleotide sequence identical over its entire length to 
each coding sequence in Table 1. Also provided by the invention is the coding sequence for 
the mature polypeptide or a fragment thereof, by itself as well as the coding sequence for the 
mature polypeptide or a fragment in reading frame with other coding sequence, such as those 
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encoding a leader or secretory sequence, a pre-, or pro- or prepro- protein sequence. The 
polynucleotide may also contain non-coding sequences, including for example, but not 
limited to non-coding 5' and 3' sequences, such as the transcribed, non-translated sequences, 
termination signals, ribosome binding sites, sequences that stabilize mRNA, introns, 
polyadenylation signals, and additional coding sequence which encode additional amino 
acids. For example, a marker sequence that facilitates purification of the fused polypeptide 
can be encoded. In certain embodiments of the invention, the marker sequence is a hexa- 
histidine peptide, as provided in the pQE vector (Qiagen, Inc.) and described in Gentz et aL, 
Proc. Natl Acad. ScL, USA 86: 821-824 (1989), or an HA tag (Wilson et aL, Cell 37: 767 
(1984). Polynucleotides of the invention also include, but are not limited to, polynucleotides 
comprising a structural gene and its naturally associated sequences that control gene 
expression. 

The invention also includes polynucleotides of the formula: 

X-(Ri) n -(R2)-(R3)n-Y 
wherein, at the 5' end of the molecule, X is hydrogen, and at the 3' end of the molecule, Y is 
hydrogen or a metal, R] and R3 is any nucleic acid residue, n is an integer between 1 and 
3000, and R2 is a nucleic acid sequence of the invention, particularly a nucleic acid sequence 
selected from the group set forth in Table 1. In the polynucleotide formula above R2 is 
oriented so that its 5* end residue is at the left, bound to and its 3' end residue is at the 
right, bound to R3. Any stretch of nucleic acid residues denoted by either R group, where R is 
greater than 1, may be either a heteropolymer or a homopolymer, preferably a heteropolymer. 
In a preferred embodiment n is an integer between 1 and 1000, or 2000 or 3000. 

The term "polynucleotide encoding a polypeptide" as used herein encompasses 
polynucleotides that include a sequence encoding a polypeptide of the invention, particularly a 
bacterial polypeptide and more particularly a polypeptide of the Streptococcus pneumoniae 
having an amino acid sequence set out in Table 1. The term also encompasses 
polynucleotides that include a single continuous region or discontinuous regions encoding the 
polypeptide (for example, interrupted by integrated phage or an insertion sequence or editing) 
together with additional regions, that also may contain coding and/or non-coding sequences. 

The invention further relates to variants of the polynucleotides described herein that 
encode for variants of the polypeptide having the deduced amino acid sequence of Table 1. 
Variants that are fragments of the polynucleotides of the invention may be used to synthesize 
full-length polynucleotides of the invention. 
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Further particularly preferred embodiments are polynucleotides encoding polypeptide 
variants, that have the amino acid sequence of a polypeptide of Table 1 in which several, a 
few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid residues are substituted, deleted or added, in 
any combination. Especially preferred among these are silent substitutions, additions and 
deletions, that do not alter the properties and activities of such polynucleotide. 

Further preferred embodiments of the invention are polynucleotides that are at least 
50%, 60% or 70% identical over their entire length to a polynucleotide encoding a 
polypeptide having the amino acid sequence set out in Table 1, and polynucleotides that are 
complementary to such polynucleotides. Alternatively, most highly preferred are 
polynucleotides that comprise a region that is at least 80% identical over its entire length to a 
polynucleotide encoding a polypeptide of the deposited strain and polynucleotides 
complementary thereto. In this regard, polynucleotides at least 90% identical over their entire 
length to the same are particularly preferred, and among these particularly preferred 
polynucleotides, those with at least 95% are especially preferred. Furthermore, those with at 
least 97% are highly preferred among those with at least 95%, and among these those with at 
least 98% and at least 99% are particularly highly preferred, with at least 99% being the more 
preferred. 

A preferred embodiment is an isolated polynucleotide comprising a polynucleotide 
sequence selected from the group consisting of: a polynucleotide having at least a 50% 
identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of 
Table 1 and obtained from a prokaryotic species other than S. pneumoniae; and a 
polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 
50% identical to the amino acid sequence of Table 1 and obtained from a prokaryotic species 
other than S. pneumoniae. 

Preferred embodiments are polynucleotides that encode polypeptides that retain 
substantially the same biological function or activity as the mature polypeptide encoded by 
the DNA of Table 1. 

The invention further relates to polynucleotides that hybridize to the herein above- 
described sequences. In this regard, the invention especially relates to polynucleotides that 
hybridize under stringent conditions to the herein above-described polynucleotides. As herein 
used, the terms "stringent conditions" and "stringent hybridization conditions" mean 
hybridization will occur only if there is at least 95% and preferably at least 97% identity 
between the sequences. An example of stringent hybridization conditions is overnight 
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incubation at 42°C in a solution comprising: 50% formamide, 5x SSC (150mM NaCl, 
15mM trisodium citrate), 50 mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10% 
dextran sulfate, and 20 micrograms/ml denatured, sheared salmon sperm DNA, followed by 
washing the hybridization support in O.lx SSC at about 65°C. Hybridization and wash 
conditions are well known and exemplified in Sambrook, et aL 9 Molecular Cloning: A 
Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), particularly 
Chapter 1 1 therein. 

The invention also provides a polynucleotide consisting essentially of a 
polynucleotide sequence obtainable by screening an appropriate library containing the 
complete gene for a polynucleotide sequence set forth in Table 1 under stringent 
hybridization conditions with a probe having the sequence of said polynucleotide sequence 
or a fragment thereof; and isolating said DNA sequence. Fragments useful for obtaining 
such a polynucleotide include, for example, probes and primers described elsewhere herein. 

As discussed additionally herein regarding polynucleotide assays of the invention, for 
instance, polynucleotides of the invention as discussed above, may be used as a hybridization 
probe for RNA, cDNA and genomic DNA to isolate full-length cDNAs and genomic clones 
encoding a polypeptide and to isolate cDNA and genomic clones of other genes that have a 
high sequence similarity to a polynucleotide set forth in Table 1 . Such probes generally will 
comprise at least 15 bases. Preferably, such probes will have at least 30 bases and may have 
at least 50 bases. Particularly preferred probes will have at least 30 bases and will have 50 
bases or less. 

For example, the coding region of each gene that comprises or is comprised by a 
polynucleotide set forth in Table 1 may be isolated by screening using a DNA sequence 
provided in Table 1 to synthesize an oligonucleotide probe. A labeled oligonucleotide having 
a sequence complementary to that of a gene of the invention is then used to screen a library of 
cDNA, genomic DNA or mRNA to determine which members of the library the probe 
hybridizes to. 

The polynucleotides and polypeptides of the invention may be employed, for 
example, as research reagents and materials for discovery of treatments of and diagnostics for 
disease, particularly human disease, as further discussed herein relating to polynucleotide 
assays. 

Polynucleotides of the invention that are oligonucleotides derived from the a 
polynucleotide or polypeptide sequence set forth in Table 1 may be used in the processes 
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herein as described, but preferably for PCR, to determine whether or not the 
polynucleotides identified herein in whole or in part are transcribed in bacteria in infected 
tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of 
infection and type of infection the pathogen has attained. 

The invention also provides polynucleotides that may encode a polypeptide that is the 
mature protein plus additional amino or carboxyl-terminal amino acids, or amino acids 
interior to the mature polypeptide (when the mature form has more than one polypeptide 
chain, for instance). Such sequences may play a role in processing of a protein from precursor 
to a mature form, may allow protein transport, may lengthen or shorten protein half-life or 
may facilitate manipulation of a protein for assay or production, among other things. As 
generally is the case in vivo, the additional amino acids may be processed away from the 
mature protein by cellular enzymes. 

A precursor protein, having the mature form of the polypeptide fused to one or more 
prosequences may be an inactive form of the polypeptide. When prosequences are removed 
such inactive precursors generally are activated. Some or all of the prosequences may be 
removed before activation. Generally, such precursors are called proproteins. 

In addition to the standard A, G, C, T/U representations for nucleic acid bases, the 
term "N" is also used. "N" means that any of the four DNA or RNA bases may appear at 
such a designated position in the DNA or RNA sequence, except it is preferred that N is not 
a base that when taken in combination with adjacent nucleotide positions, when read in the 
correct reading frame, would have the effect of generating a premature termination codon 
in such reading frame. 

In sum, a polynucleotide of the invention may encode a mature protein, a mature 
protein plus a leader sequence (which may be referred to as a preprotein), a precursor of a 
mature protein having one or more prosequences that are not the leader sequences of a 
preprotein, or a preproprotein, which is a precursor to a proprotein, having a leader sequence 
and one or more prosequences, which generally are removed during processing steps that 
produce active and mature forms of the polypeptide. 

Vectors, host cells, expression 

The invention also relates to vectors that comprise a polynucleotide or 
polynucleotides of the invention, host cells that are genetically engineered with vectors of the 
invention and the production of polypeptides of the invention by recombinant techniques. 
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Cell-free translation systems can also be employed to produce such proteins using RNAs 
derived from the DNA constructs of the invention. 

For recombinant production, host cells can be genetically engineered to incorporate 
expression systems or portions thereof or polynucleotides of the invention. Introduction of a 
polynucleotide into the host cell can be effected by methods described in many standard 
laboratory manuals, such as Davis et al, BASIC METHODS IN MOLECULAR BIOLOGY, 
(1986) and Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), such as, calcium 
phosphate transfection, DEAE-dextran mediated transfection, transvection, microinjection, 
cationic lipid-mediated transfection, electroporation, transduction, scrape loading, ballistic 
introduction and infection. 

Representative examples of appropriate hosts include bacterial cells, such as 
streptococci, staphylococci, enterococci E. coli, streptomyces and Bacillus subtilis cells; 
fungal cells, such as yeast cells and Aspergillus cells; insect cells such as Drosophila S2 and 
Spodoptera Sf9 cells; animal cells such as CHO, COS, HeLa, CI 27, 3T3, BHK, 293 and 
Bowes melanoma cells; and plant cells. 

A great variety of expression systems can be used to produce the polypeptides of the 
invention. Such vectors include, among others, chromosomal, episomal and virus-derived 
vectors, e.g., vectors derived from bacterial plasmids, from bacteriophage, from transposons, 
from yeast episomes, from insertion elements, from yeast chromosomal elements, from 
viruses such as baculoviruses, papova viruses, such as SV40, vaccinia viruses, adenoviruses, 
fowl pox viruses, pseudorabies viruses and retroviruses, and vectors derived from 
combinations thereof, such as those derived from plasmid and bacteriophage genetic 
elements, such as cosmids and phagemids. The expression system constructs may contain 
control regions that regulate as well as engender expression. Generally, any system or vector 
suitable to maintain, propagate or express polynucleotides and/or to express a polypeptide in a 
host may be used for expression in this regard. The appropriate DNA sequence may be 
inserted into the expression system by any of a variety of well-known and routine techniques, 
such as, for example, those set forth in Sambrook et al., MOLECULAR CLONING, A 
LABORATORY MANUAL, {supra). 

For secretion of the translated protein into the lumen of the endoplasmic reticulum, 
into the periplasmic space or into the extracellular environment, appropriate secretion signals 
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may be incorporated into the expressed polypeptide. These signals may be endogenous to the 
polypeptide or they may be heterologous signals. 

Polypeptides of the invention can be recovered and purified from recombinant cell 
cultures by well-known methods including ammonium sulfate or ethanol precipitation, acid 
extraction, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography, and lectin chromatography. Most preferably, high performance liquid 
chromatography is employed for purification. Well known techniques for refolding protein 
may be employed to regenerate active conformation when the polypeptide is denatured during 
isolation and or purification. 

Diagnostic Assays 

This invention is also related to the use of the polynucleotides of the invention for use 
as diagnostic reagents. Detection of such polynucleotides in a eukaryote, particularly a 
mammal, and especially a human, will provide a diagnostic method for diagnosis of a disease. 
Eukaryotes (herein also "individual(s)"), particularly mammals, and especially humans, 
infected with an organism comprising a gene of the invention may be detected at the nucleic 
acid level by a variety of techniques. 

Nucleic acids for diagnosis may be obtained from an infected individual's cells and 
tissues, such as bone, blood, muscle, cartilage, and skin. Genomic DNA may be used directly 
for detection or may be amplified enzymatically by using PCR or other amplification 
technique prior to analysis. RNA or cDNA may also be used in the same ways. Using 
amplification, characterization of the species and strain of prokaryote present in an individual, 
may be made by an analysis of the genotype of the prokaryote gene. Deletions and insertions 
can be detected by a change in size of the amplified product in comparison to the genotype of 
a reference sequence. Point mutations can be identified by hybridizing amplified DNA to 
labeled polynucleotide sequences of the invention. Perfectly matched sequences can be 
distinguished from mismatched duplexes by RNase digestion or by differences in melting 
temperatures. DNA sequence differences may also be detected by alterations in the 
electrophoretic mobility of the DNA fragments in gels, with or without denaturing agents, or 
by direct DNA sequencing. See, e.g., Myers et al., Science, 230: 1242 (1985). Sequence 
changes at specific locations also may be revealed by nuclease protection assays, such as 
RNase and SI protection or a chemical cleavage method. See, e.g., Cotton et al., Proc. Natl. 
Acad. Set, USA, 85: 4397-4401 

23 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



(1985). 

Cells carrying mutations or polymorphisms in the gene of the invention may also be 
detected at the DNA level by a variety of techniques, to allow for serotyping, for example. 
For example, RT-PCR can be used to detect mutations. It is particularly preferred to used RT- 
PCR in conjunction with automated detection systems, such as, for example, GeneScan. RNA 
or cDNA may also be used for the same purpose, PCR or RT-PCR. As an example, PCR 
primers complementary to a nucleic acid encoding a polypeptide of the invention can be used 
to identify and analyze mutations. These primers may be used for, among other things, 
amplifying a DNA of the invention isolated from a sample derived from an individual. The 
primers may be used to amplify the gene isolated from an infected individual such that the 
gene may then be subject to various techniques for elucidation of the DNA sequence. In this 
way, mutations in the DNA sequence may be detected and used to diagnose infection and to 
serotype and/or classify the infectious agent. 

The invention further provides a process for diagnosing disease, preferably bacterial 
infections, more preferably infections by Streptococcus pneumoniae, and most preferably 
disease, comprising determining from a sample derived from an individual a increased level 
of expression of polynucleotide having the sequence of Table 1 . Increased or decreased 
expression of a polynucleotide of the invention can be measured using any on of the 
methods well known in the art for the quantitation of polynucleotides, such as, for example, 
amplification, PCR, RT-PCR, RNase protection, Northern blotting and other hybridization 
methods. 

In addition, a diagnostic assay in accordance with the invention for detecting over- 
expression of a polypeptide of the invention compared to normal control tissue samples may 
be used to detect the presence of an infection, for example. Assay techniques that can be used 
to determine levels of a protein, in a sample derived from a host are well-known to those of 
skill in the art. Such assay methods include radioimmunoassays, competitive-binding assays, 
Western Blot analysis and ELISA assays. 

Antibodies 

The polypeptides of the invention or variants thereof, or cells expressing them can be 
used as an immunogen to produce antibodies immunospecific for such polypeptides. 
"Antibodies" as used herein includes monoclonal and polyclonal antibodies, chimeric, single 
chain, simianized antibodies and humanized antibodies, as well as Fab fragments, including 
the products of an Fab immunolglobulin expression library. 
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Antibodies generated against the polypeptides of the invention can be obtained by 
administering the polypeptides or epitope-bearing fragments, analogues or cells to an animal, 
preferably a nonhuman, using routine protocols. For preparation of monoclonal antibodies, 
any technique known in the art that provides antibodies produced by continuous cell line 
cultures can be used. Examples include various techniques, such as those in Kohler, G. and 
Milstein, C, Nature 256: 495-497 (1975); Kozbor et al, Immunology Today 4: 72 (1983); 
Cole et al., pg. 77-96 in MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. 
Liss, Inc. (1985). 

Techniques for the production of single chain antibodies (U.S. Patent No. 4,946,778) 
can be adapted to produce single chain antibodies to polypeptides of this invention. Also, 
transgenic mice, or other organisms such as other mammals, may be used to express 
humanized antibodies. 

Alternatively phage display technology may be utilized to select antibody genes 
with binding activities towards the polypeptide either from repertoires of PCR amplified v- 
genes of lymphocytes from humans screened for possessing recognition of a polypeptide of 
the invention or from naive libraries (McCafferty, J. et al., (1990), Nature 348, 552-554; 
Marks, J. et al, (1992) Biotechnology 10, 779-783). The affinity of these antibodies can 
also be improved by chain shuffling (Clackson, T. et al., (1991) Nature 352, 624-628). 

If two antigen binding domains are present each domain may be directed against a 
different epitope - termed 'bispecific' antibodies. 

The above-described antibodies may be employed to isolate or to identify clones 
expressing the polypeptides to purify the polypeptides by affinity chromatography. 

Thus, among others, antibodies against a polypeptide of the invention may be 
employed to treat disease. 

Polypeptide variants include antigenically, epitopically or immunologically 
equivalent variants that form a particular aspect of this invention. The term "antigenically 
equivalent derivative" as used herein encompasses a polypeptide or its equivalent which 
will be specifically recognized by certain antibodies which, when raised to the protein or 
polypeptide according to the invention, interfere with the immediate physical interaction 
between pathogen and mammalian host. The term "immunologically equivalent derivative" 
as used herein encompasses a peptide or its equivalent which when used in a suitable 
formulation to raise antibodies in a vertebrate, the antibodies act to interfere with the 
immediate physical interaction between pathogen and mammalian host. 
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The polypeptide, such as an antigenically or immunologically equivalent derivative 
or a fusion protein thereof is used as an antigen to immunize a mouse or other animal such 
as a rat or chicken. The fusion protein may provide stability to the polypeptide. The 
antigen may be associated, for example by conjugation, with an immunogenic carrier 
protein for example bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH). 
Alternatively a multiple antigenic peptide comprising multiple copies of the protein or 
polypeptide, or an antigenically or immunologically equivalent polypeptide thereof may be 
sufficiently antigenic to improve immunogenicity so as to obviate the use of a carrier. 

Preferably, the antibody or variant thereof is modified to make it less immunogenic 
in the individual. For example, if the individual is human the antibody may most 
preferably be "humanized"; where the complimentarity determining region(s) of the 
hybridoma-derived antibody has been transplanted into a human monoclonal antibody , for 
example as described in Jones, P. et al. (1986), Nature 321, 522-525 or Tempest et 
al.,(1991) Biotechnology 9, 266-273. 

The use of a polynucleotide of the invention in genetic immunization will 
preferably employ a suitable delivery method such as direct injection of plasmid DNA into 
muscles (Wolff et al., Hum Mol Genet 1992, 1:363, Manthorpe et al., Hum. Gene Ther. 
1963:4, 419), delivery of DNA complexed with specific protein carriers (Wu et al., J Biol 
Chem. 1989: 264,16985), coprecipitation of DNA with calcium phosphate (Benvenisty & 
Reshef, PNAS, 1986:83,9551), encapsulation of DNA in various forms of liposomes 
(Kaneda et al., Science 1989:243,375), particle bombardment (Tang et al, Nature 1992, 
356:152, Eisenbraun et al., DNA Cell Biol 1993, 12:791) and in vivo infection using cloned 
retroviral vectors (Seeger et al., PNAS 1984:81,5849). 

Antagonists and agonists - assays and molecules 

Polypeptides of the invention may also be used to assess the binding of small 
molecule substrates and ligands in, for example, cells, cell-free preparations, chemical 
libraries, and natural product mixtures. These substrates and ligands may be natural substrates 
and ligands or may be structural or functional mimetics. See, e.g., Coligan et al, Current 
Protocols in Immunology 1(2): Chapter 5 (1991). 

The invention also provides a method of screening compounds to identify those 
which enhance (agonist) or block (antagonist) the action of a polypeptides or polynucleotides 
of the invention, particularly those compounds that are bacteriostatic and/or bacteriocidal. 
The method of screening may involve high-throughput techniques. For example, to screen for 
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agonists or antagoists, a synthetic reaction mix, a cellular compartment, such as a membrane, 
cell envelope or cell wall, or a preparation of any thereof, comprising a polypeptide of the 
invention and a labeled substrate or ligand of such polypeptide is incubated in the absence or 
the presence of a candidate molecule that may be an agonist or antagonist of a polypeptide of 
the invention. The ability of the candidate molecule to agonize or antagonize a polypeptide of 
the invention is reflected in decreased binding of the labeled ligand or decreased production of 
product from such substrate. Molecules that bind gratuitously, i.e., without inducing the 
effects of a polypeptide of the invention are most likely to be good antagonists. Molecules 
that bind well and increase the rate of product production from substrate are agonists. 
Detection of the rate or level of production of product from substrate may be enhanced by 
using a reporter system. Reporter systems that may be useful in this regard include but are not 
limited to colorimetric labeled substrate converted into product, a reporter gene that is 
responsive to changes in polynucleotide or polypeptide activity, and binding assays known in 
the art. 

Another example of an assay for antagonists of polypeptides of the invention is a 
competitive assay that combines any such polypeptide and a potential antagonist with a 
compound which binds such polypeptide, natural substrates or ligands, or substrate or ligand 
mimetics, under appropriate conditions for a competitive inhibition assay. A polypeptide of 
the invention can be labeled, such as by radioactivity or a colorimetric compound, such that 
the number of such polypeptide molecules bound to a binding molecule or converted to 
product can be determined accurately to assess the effectiveness of the potential antagonist. 

Potential antagonists include small organic molecules, peptides, polypeptides and 
antibodies that bind to a polynucleotide or polypeptide of the invention and thereby inhibit or 
extinguish its activity. Potential antagonists also may be small organic molecules, a peptide, a 
polypeptide such as a closely related protein or antibody that binds the same sites on a binding 
molecule, such as a binding molecule, without inducing activities induced by a polypeptide of 
the invention, thereby preventing the action of such polypeptide by excluding it from binding. 

Potential antagonists include a small molecule that binds to and occupies the binding 
site of the polypeptide thereby preventing binding to cellular binding molecules, such that 
normal biological activity is prevented. Examples of small molecules include but are not 
limited to small organic molecules, peptides or peptide-like molecules. Other potential 
antagonists include antisense molecules (see Okano, J. Neurochem. 56: 560 (1991); 
OLI GODEOXYNU CLEOTIDES AS ANTISENSE INHIBITORS OF GENE EXPRESSION, 
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CRC Press, Boca Raton, FL (1988), for a description of these molecules). Preferred potential 
antagonists include compounds related to and variants of a polypeptide of the invention. 

Each of the DNA sequences provided herein may be used in the discovery and 
development of antibacterial compounds. The encoded protein, upon expression, can be 
used as a target for the screening of antibacterial drugs. Additionally, the DNA sequences 
encoding the amino terminal regions of the encoded protein or Shine-Delgarno or other 
translation facilitating sequences of the respective mRNA can be used to construct 
antisense sequences to control the expression of the coding sequence of interest. 

The invention also provides the use of the polypeptide, polynucleotide or inhibitor 
of the invention to interfere with the initial physical interaction between a pathogen and 
mammalian host responsible for sequelae of infection. In particular the molecules of the 
invention may be used: in the prevention of adhesion of bacteria, in particular gram positive 
bacteria, to mammalian extracellular matrix proteins on in-dwelling devices or to 
extracellular matrix proteins in wounds; to block protein-mediated mammalian cell 
invasion by, for example, initiating phosphorylation of mammalian tyrosine kinases 
(Rosenshine et aL, Infect lmmun. 60:2211 (1992); to block bacterial adhesion between 
mammalian extracellular matrix proteins and bacterial proteins that mediate tissue damage 
and; to block the normal progression of pathogenesis in infections initiated other than by 
the implantation of in-dwelling devices or by other surgical techniques. 

The antagonists and agonists of the invention may be employed, for instance, to 
inhibit and treat disease. 

Helicobacter pylori (herein H. pylori) bacteria infect the stomachs of over one-third 
of the world's population causing stomach cancer, ulcers, and gastritis (International 
Agency for Research on Cancer (1994) Schistosomes, Liver Flukes and Helicobacter Pylori 
(International Agency for Research on Cancer, Lyon, France; 
http://www.uicc.ch/ecp/ecp2904.htm). Moreover, the international Agency for Research on 
Cancer recently recognized a cause-and-effect relationship between H. pylori and gastric 
adenocarcinoma, classifying the bacterium as a Group I (definite) carcinogen. Preferred 
antimicrobial compounds of the invention found using screens provided by the invention, 
particularly broad-spectrum antibiotics, should be useful in the treatment of H. pylori 
infection. Such treatment should decrease the advent of H. pylori-induced cancers, such as 
gastrointestinal carcinoma. Such treatment should also cure gastric ulcers and gastritis. 

Vaccines 

28 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



Another aspect of the invention relates to a method for inducing an immunological 
response in an individual, particularly a mammal which comprises inoculating the 
individual with a polypeptide of the invention, or a fragment or variant thereof, adequate to 
produce antibody and/ or T cell immune response to protect said individual from infection, 
particularly bacterial infection and most particularly Streptococcus pneumoniae infection. 
Also provided are methods whereby such immunological response slows bacterial 
replication. Yet another aspect of the invention relates to a method of inducing 
immunological response in an individual which comprises delivering to such individual a 
nucleic acid vector to direct expression of a polynucleotide or polypeptide of the invention, 
or a fragment or a variant thereof, for expressing such polynucleotide or polypeptide, or a 
fragment or a variant thereof in vivo in order to induce an immunological response, such as, 
to produce antibody and/ or T cell immune response, including, for example, cytokine- 
producing T cells or cytotoxic T cells, to protect said individual from disease, whether that 
disease is already established within the individual or not. One way of administering the 
gene is by accelerating it into the desired cells as a coating on particles or otherwise. Such 
nucleic acid vector may comprise DNA, RNA, a modified nucleic acid, or a DNA/RNA 
hybrid. 

A further aspect of the invention relates to an immunological composition which, 
when introduced into an individual capable or having induced within it an immunological 
response, induces an immunological response in such individual to a polynucleotide of the 
invention or protein coded therefrom, wherein the composition comprises a recombinant 
polynucleotide or protein coded therefrom comprising DNA which codes for and expresses 
an antigen of said polynucleotide or protein coded therefrom. The immunological response 
may be used therapeutically or prophylactically and may take the form of antibody 
immunity or cellular immunity such as that arising from CTL or CD4+ T cells. 

A polypeptide of the invention or a fragment thereof may be fused with co-protein 
which may not by itself produce antibodies, but is capable of stabilizing the first protein 
and producing a fused protein which will have immunogenic and protective properties. 
Thus fused recombinant protein, preferably further comprises an antigenic co-protein, such 
as lipoprotein D from Hemophilus influenzae, Glutathione-S-transferase (GST) or beta- 
galactosidase, relatively large co-proteins which solubilize the protein and facilitate 
production and purification thereof. Moreover, the co-protein may act as an adjuvant in the 
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sense of providing a generalized stimulation of the immune system. The co-protein may be 
attached to either the amino or carboxy terminus of the first protein. 

Provided by this invention are compositions, particularly vaccine compositions, and 
methods comprising the polypeptides or polynucleotides of the invention and 
immunostimulatory DNA sequences, such as those described in Sato, Y. et al. Science 273: 
352(1996). 

Also, provided by this invention are methods using the described polynucleotide or 
particular fragments thereof which have been shown to encode non-variable regions of 
bacterial cell surface proteins in DNA constructs used in such genetic immunization 
experiments in animal models of infection with Streptococcus pneumoniae will be 
particularly useful for identifying protein epitopes able to provoke a prophylactic or 
therapeutic immune response. It is believed that this approach will allow for the 
subsequent preparation of monoclonal antibodies of particular value from the requisite 
organ of the animal successfully resisting or clearing infection for the development of 
prophylactic agents or therapeutic treatments of bacterial infection, particularly 
Streptococcus pneumoniae infection, in mammals, particularly humans. 

The polypeptide may be used as an antigen for vaccination of a host to produce 
specific antibodies which protect against invasion of bacteria, for example by blocking 
adherence of bacteria to damaged tissue. Examples of tissue damage include wounds in 
skin or connective tissue caused, e.g., by mechanical, chemical or thermal damage or by 
implantation of indwelling devices, or wounds in the mucous membranes, such as the 
mouth, mammary glands, urethra or vagina. 

The invention also includes a vaccine formulation which comprises an 
immunogenic recombinant protein of the invention together with a suitable carrier. Since 
the protein may be broken down in the stomach, it is preferably administered parenterally, 
including, for example, administration that is subcutaneous, intramuscular, intravenous, or 
intradermal. Formulations suitable for parenteral administration include aqueous and non- 
aqueous sterile injection solutions which may contain anti-oxidants, buffers, bacteriostats 
and solutes which render the formulation isotonic with the bodily fluid, preferably the 
blood, of the individual; and aqueous and non- aqueous sterile suspensions which may 
include suspending agents or thickening agents. The formulations may be presented in 
unit-dose or multi-dose containers, for example, sealed ampules and vials and may be 
stored in a freeze-dried condition requiring only the addition of the sterile liquid carrier 
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immediately prior to use. The vaccine formulation may also include adjuvant systems for 
enhancing the immunogenicity of the formulation, such as oil-in water systems and other 
systems known in the art. The dosage will depend on the specific activity of the vaccine 
and can be readily determined by routine experimentation. 

While the invention has been described with reference to certain protein, such as, 
for example, those set forth in Table 1, it is to be understood that this covers fragments of 
the naturally occurring protein and similar proteins with additions, deletions or 
substitutions which do not substantially affect the immunogenic properties of the 
recombinant protein. 

Compositions, kits and administration 

The invention also relates to compositions comprising the polynucleotide or the 
polypeptides discussed above or their agonists or antagonists. The polypeptides of the 
invention may be employed in combination with a non-sterile or sterile carrier or carriers for 
use with cells, tissues or organisms, such as a pharmaceutical carrier suitable for 
administration to a subject. Such compositions comprise, for instance, a media additive or a 
therapeutically effective amount of a polypeptide of the invention and a pharmaceutical^ 
acceptable carrier or excipient. Such carriers may include, but are not limited to, saline, 
buffered saline, dextrose, water, glycerol, ethanol and combinations thereof. The formulation 
should suit the mode of administration. The invention further relates to diagnostic and 
pharmaceutical packs and kits comprising one or more containers filled with one or more of 
the ingredients of the aforementioned compositions of the invention. 

Polypeptides and other compounds of the invention may be employed alone or in 
conjunction with other compounds, such as therapeutic compounds. 

The pharmaceutical compositions may be administered in any effective, convenient 
manner including, for instance, administration by topical, oral, anal, vaginal, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes among others. 

In therapy or as a prophylactic, the active agent may be administered to an 
individual as an injectable composition, for example as a sterile aqueous dispersion, 
preferably isotonic. 

Alternatively the composition may be formulated for topical application 
for example in the form of ointments, creams, lotions, eye ointments, eye drops, ear drops, 
mouthwash, impregnated dressings and sutures and aerosols, and may contain appropriate 
conventional additives, including, for example, preservatives, solvents to assist drug 
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penetration, and emollients in ointments and creams. Such topical formulations may also 
contain compatible conventional carriers, for example cream or ointment bases, and ethanol 
or oleyl alcohol for lotions. Such carriers may constitute from about 1% to about 98% by 
weight of the formulation; more usually they will constitute up to about 80% by weight of 
the formulation. 

For administration to mammals, and particularly humans, it is expected that the 
daily dosage level of the active agent will be from 0.01 mg/kg to 10 mg/kg, typically 
around 1 mg/kg. The physician in any event will determine the actual dosage which will be 
most suitable for an individual and will vary with the age, weight and response of the 
particular individual. The above dosages are exemplary of the average case. There can, of 
course, be individual instances where higher or lower dosage ranges are merited, and such 
are within the scope of this invention. 

In-dwelling devices include surgical implants, prosthetic devices and catheters, i.e., 
devices that are introduced to the body of an individual and remain in position for an 
extended time. Such devices include, for example, artificial joints, heart valves, 
pacemakers, vascular grafts, vascular catheters, cerebrospinal fluid shunts, urinary 
catheters, continuous ambulatory peritoneal dialysis (CAPD) catheters. 

The composition of the invention may be administered by injection to achieve a 
systemic effect against relevant bacteria shortly before insertion of an in-dwelling device. 
Treatment may be continued after surgery during the in-body time of the device. In 
addition, the composition could also be used to broaden perioperative cover for any surgical 
technique to prevent bacterial wound infections, especially Streptococcus pneumoniae 
wound infections. 

Many orthopedic surgeons consider that humans with prosthetic joints should be 
considered for antibiotic prophylaxis before dental treatment that could produce a 
bacteremia. Late deep infection is a serious complication sometimes leading to loss of the 
prosthetic joint and is accompanied by significant morbidity and mortality. It may 
therefore be possible to extend the use of the active agent as a replacement for prophylactic 
antibiotics in this situation. 

In addition to the therapy described above, the compositions of this invention may 
be used generally as a wound treatment agent to prevent adhesion of bacteria to matrix 
proteins exposed in wound tissue and for prophylactic use in dental treatment as an 
alternative to, or in conjunction with, antibiotic prophylaxis. 
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Alternatively, the composition of the invention may be used to bathe an indwelling 
device immediately before insertion. The active agent will preferably be present at a 
concentration of l|Lig/ml to lOmg/ml for bathing of wounds or indwelling devices. 

A vaccine composition is conveniently in injectable form. Conventional adjuvants 
may be employed to enhance the immune response. A suitable unit dose for vaccination is 
0.5-5 microgram/kg of antigen, and such dose is preferably administered 1-3 times and 
with an interval of 1-3 weeks. With the indicated dose range, no adverse toxicological 
effects will be observed with the compounds of the invention which would preclude their 
administration to suitable individuals. 

Each reference disclosed herein is incorporated by reference herein in its entirety. 
Any patent application to which this application claims priority is also incorporated by 
reference herein in its entirety. 
TABLES 

Certain pertinent data for preferred polypeptide and polynucleotide embodiments of 
the invention are summarized in Tables 1 and 2. 

Provided in Table 1 are sequence search results providing characterization 
information regarding certain preferred polynucleotides (denoted as "Assembly") and 
polypeptides of the invention encoded thereby. For each polynucleotide in Table 1, there is 
listed the closest homologue of each polypeptide encoded by each ORF in such 
polynucleotide. This determination of homology is based on a comparison of the sequences 
of in Table 1 with sequences available in the public domain (see heading entitled 
"Description" for the homologue name). Where no significant homologue was detected the 
term "unknown" appears after the heading "Description". Preferred polypeptides encoded 
by the ORFs of the invention, particularly full length proteins either obtained using such 
ORFs or encoded entirely by such ORFs, are ones that have a biological function of the 
homologue listed, among other functions. The analysis used to determine each homologue 
listed in Table 1 was either BlastP and/or BlastX and/or MPSearch, each of which is well 
known. Also provided in Table 1 is the amino acid sequence encoded by each ORF. An 
"Assembly ID" number provides a convenient way to correlate the polynucleotide sequence 
with the ORF or ORFs it comprises and the polypeptides encoded by these ORFs, as well as 
to correlate such sequences with other pertinent information provided in Tables 1 and 2. 
Following the heading "ORF Predictions" the nucleotides at the beginning and end of the 
ORF sequence are set forth ("Start" and "End" respectively). The direction of translation 
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on the polynucleotide depicted is denoted by an "F" for forward or an "R" for reverse 
(reverse being translated on the opposite strand from the one depicted). The length of each 
amino acid sequence is also indicated in a column entitled "Length." Below these data is 
shown the amino acid sequence encoded by the ORF. If a given polynucleotide comprises 
one ORF, then in the column entitled "ORF #" there is the numeral one. If it encodes two, 
there are the numerals one and two in the column, and so on. 



TABLE 1 

Assembly ID: 3049156 
Assembly Length: 495bp 

>[SEQ ID NO:l] 3049156 Strep Assembly Assembly id#3049156 
CTCGGTGATAGAAATAGTGTAATCATGCTTTTCTCTTCTTATCTATACTTTGCTACTTCT 
ATTATACAAAAAAATAAAGCGCTTGACTAGGGATTTTTAGAAAAAAAGCCTATTTTTTCA 
AGAAAAAT AGGCTTTTTGC G AAC G ATTG AC AC AATTGG ATTTGGTT AATTC AC TCTT AAC 
GATGGTTTTAAACGATATATATTTTTATATATGTAAATTAAAAACTTCTTTCCTTTCACT 
TCCTACGACTTTTCAGATACAGATAGCCAAAGAAGTTTTCATAGAGGGCAAAAAAGAGGA 
GGAAGGCATGAAGAAAGAAGGTCTCTGGCAAAATCATAATAACAGGATCCTTGGCTGGAT 
CAAAAAGCCAGGTATCATCTCCCACAAAGAGAATTTGATGGAAAAGAGTAAAGAATTGGT 
CAAAACCAATCAAAACTCCCCCAAGTCCATCATCACAGGTAAGACTACTAGAGCCAGGAG 
ACTTTTTCGATAAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 236 385 R 50 aa 

>[SEQ ID NO: 88] 3049156-1 ORF translation from 236-385, direction R 
VGDDTWLFDPAKDPVIMILPETFFLHAFLLFFALYENFFGYLYLKSRRK* 

Description: 
unknown 

Assembly ID: 3049862 
Assembly Length: 52 9bp 

>[SEQ ID NO:2] 3049862 Strep Assembly — Assembly id#3049862 
CTAGAGCAAGTATTTTTCAAACTTTTTCCGAATAAATAGATAGAGCCAGAGAATTTAGTA 
AACCTAGATTTAAAAATGTGCTATAACATAATATATTGAATCTATAATAGTACACCTTGA 
CTGCTAAAATATTTCTATAAATTAATTTGACTTTCCTGATAGAGTTATTCACATCTTATT 
TCAACTCACTATAGAAGGAGGAATAGGAGGATTCTCAGACATCCGGGCATCAGCCCAACT 
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AATGATTTGATTGCTAAGAAAATATTCAGCAATCCAGAAATCACTTGTCAATTTATTCGC 
GATATGCTGGACTTGCCAGCAAAAAATGTTGACCATTTTGGAGGGAAGCGATATTCACGT 
ATTACTCTCCATGCCTTACTCAGTGCAGGATTTTTATACCAGTATAGACGTCTTGGCGGA 
GTTGGATAACGGTACTCAAGTAATTATTGAGATTCAAGTCCATCATCAGAATTTTTCATC 
AATCACTTGTGGACTTACCTGTGCAGTCAGGTTAATCAAATCTTGAAAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 383 526 F 48 aa 

>[SEQ ID NO: 89] 3049862-1 ORF translation from 383-526, direction F 
VQDFYTSIDVLAELDNGTQVIIEIQVHHQNFSSITCGLTCAVRLIKS* 

Description : 
unknown 

Assembly ID: 3112810 
Assembly Length: 885bp 

>[SEQ ID NO:3] 3112810 Strep Assembly -- Assembly id#3112810 
CTCATCATCTGTCAAAAAGCGTTTCTTAGCAGTCGTGATATCCATAAAATAATCTAATAT 
CACGATTTCCTCATCCGCAAAGAAAGGAAGGCTGACCAACTCCAGTGCCACATCCTTGTA 
AACTACTTCTTGCATATCAAAGTAGGCAAAGTTGAGGTCAGCAGAATCATACCCAATCTG 
TTTCAACACTTGACTCTTCATCACTTCAAACTGACCCTGATCTGTCCCTGTAAATAGGCG 
CAGGCTCGGTAAATTCGATAAAGTCAACTTCTGACTTTCTTCAATGGCTAGCATCGTCTC 
TCCTTTCTTCAGATTTTTCGATTTAATTTAGTCAATATAGCGCAATTTCCCACGGAAATC 
TTCTAAGCTCTCGTAGCCTTTTTCCACCATGATTGCTTTCAGTTCATTGGTAAAGCGGTC 
AAAAGCACTGACGCCTTCTTTGTGAAGGGTCGTTCCCACCTGCACCATACTTGCTCCACA 
GAGGATGTGTTCAAAGGCATCTCGACCAGTCAGAACGCCACCTGTTCCGATAATTTGGAT 
TTGAGGATTTAAACGTTGATAAAAGGCGTGAACATTGGCTAGAGCAGTCGGTTTGATGTA 
TTATCCACCAATTCCACCAAAACCATTCTTAGGCCGAATAACGACAGATTCGTCTTCTAT 
ATAGAGGCCGTTTCCGATAGAGTTAACGCAGTTGACAAACTTGAGCGGATATTTGTTGAA 
AATAGCTGCCGCTTGATCAAAGTGAACAATATCAAAATAAGGTGGCAATTTAATTCCAAG 
AGGTTTGGTGAAGTAAGCAAACACTTCTGCCAAAATCCGGTCTGTTGTCTCAAAATCATA 
GGCAATCTGAGGTTTACCTGGAACATTTGGACAGGAAAGATTTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 601 804 R 68 aa 

>[SEQ ID NO: 90] 3112810-2 ORF translation from 601-804, direction R 
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VFAYFTKPLGIKLPPYFDIVHFDQAAAIFNKYPLKFVNCVNSIGNGLYIEDESWIRPKN 
GFGGIGG* 

Description: 

LLCPYRDA NCBI gi : 511014 - Lactococcus lactis . DIHYDROOROTATE 
DEHYDROGENASE (EC 1.3.3.1) { DIHYDROOROTATE OXIDASE ) 

Assembly ID: 3112866 
Assembly Length: 92 5bp 

>[SEQ ID NO: 4] 3112866 Strep Assembly -- Assembly id#3112866 
TCTTGGCCAACTGCATGGAGTTCAGCGGTCAATTTCAACGCACCTGAGAAACAGACCCCT 
GCACCCCTGAAATCTCAGGAGACATGATGGTCTGGATGGAATCAATAATGAGAAAGTCTG 
GCTGGATACGCTACCACTTCTGCACGAACACTCTGCATATTGGTCTCTGCATAGAGATAA 
AACTCACTATCAAAATCACCTAAGCGCTCTGCACGTAGTTTAATCTGCTGGGCAGACTCC 
TCCCCACTGACATAGAGAACTGTCCCCACTTGGGACAACTGGGTTGAGACTTGTAGGAGA 
AGAGTTGATTTCCCAATCCCAGGATCCCCACCGATGAGGACGAGACTTTCCTGGTACAAC 
TCCGCCTCCAAGCACACGGTTGAATTCCTCCATCTCCGTCTTGGTTCGATTGACATTGAT 
GGAAGTCACCTCAGCTAGTTTCATGGGCTTGGTTTTCTCACCTGTCAAGGACACACGCGC 
ATTCTTGACCTCGGCAACCTCAACCTCTTCCACAAAAGAAGACCAAGACCCACAGTTGGG 
GCAACGTCCCAGATATTTAGGGGAATTATACCCACAATTTTGACATACAAATGTCGCTTT 
TTTCTTTGCGATGACAAACCTCTTTCTATATCTCTAACTCACACTCAATCACTTGGCAAA 
AATCAATCTTCTCATTTGGCACAAACTGGCGCATGAGCATTCGATGAGCAACAACTACCA 
CAGTCTGATGTTCTCGATACTTAGACATACATTCTAGAAACCGAGACTTCATTTCCGTAG 
CTGTCTCATATTGAATAGGACTATTAGGAAGCAACTCCCCCTTGTTTTCTAAAAACAGTC 
TTCTAGCTGTTTCAAAGTTTTCTATTCCTGTTTTATAGACCTGCCATTCATGTAATAAAG 
GCTCTACTCTTAAAGGAAGACCCGT 

ORF Predictions: 

ORF # Start End Direction Length 



1 220 513 R 98 aa 

>[SEQ ID NO: 91] 3112866-2 ORF translation from 220-513, direction R 

VEEVEVAEVKNARVSLTGEKTKPMKLAEVTSINVNRTKTEMEEFNRVLGGGWPGKSRP 

RWGSWDWEINSSPTSLNPWPSGDSSLCQWGGVCPAD* 

Description : 
SMS PROTEIN. - ESCHERICHIA COLI . 

Assembly ID: 3113664 
Assembly Length: 602bp 
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>[SEQ ID NO: 5] 3113664 Strep Assembly — Assembly id#3113664 
TTATGTCAGTGGGATTACGCCTAATCTCCCAGAAGCAGAATTATTATCCGGTCAGGAAAT 
TAAAACCTTGGNAGACATGAAAACTGCAGCGCAGAAATTGCATGATTTAGGAGCGCCAGC 
AGTCATTATCAAAGGGAGGCAATCGTCTTAGTCAGGACAAGGCTGTGGATGTCTTTTATG 
ATGGACAGACCTTTACTATCCTAGAAAATCCAGTTATCCAAGGCCAAAATGCTGGTGCAG 
GTTGTACCTTTGCCTCTAGCATTGCCAGTCACTTGGTTAAAGGTGATAAACTTTTGCCAG 
CAGTAGAAAGCTCTAAGGCTTTCGTTTATCGTGCTATTGCACAAGCAGATCAGTATGGAG 
TAAGACAATATGAAGCAAACAAAAACAACTAAAATCGCCCTTGTATCCCTATTAACCGCC 
CTTTCTGTGGTTCTAGGTTATTTCTTAAAAATCCCAACACCTACAGGNATTCTAACTCTT 
TTAGATGCTGGTGTCTTCTTTGCGGCCTTTTACTTTGGTAGTCGTGAAGGAGCGGTAGTC 
GGAGGACTAGCAAGTTTCTTGCTTGACCTCTTATCAGGCTACCCTCAGTGGATGTTTTTT 
AG 

ORF Predictions: 

ORF # Start End Direction Length 



1 165 392 F 76 aa 

>[SEQ ID NO: 92] 3113664-1 ORF translation from 165-392, direction F 
VDVF YDGQTFTI LENPVI QGQNAGAGCTFAS S I ASHLVKGDKLLPAVES SKAFVYRAI AQ 
ADQYGVRQYEANKNN* 

Description : 
Thi protein - Rhizobium meliloti 

Assembly ID: 3113716 
Assembly Length: 456bp 

>[SEQ ID NO: 6] 3113716 Strep Assembly Assembly id#3113716 
CTGGATACTAAGAGAAATCAAAAAAGCACTCTAGGATAGAGGCCTAAAGTGCTTAGTTTC 
AAGGCTTTACAGCCTATCATATTTAATAAAATATTACAACATCTTGTTGTAGAATTCAAC 
GACAAGTGCTTCGTTGATTTCTGGGTTGATTTCGTCGCGTTCTGGCAAGCGAGTCAATGA 
ACCTTCCAATTTTTCAGCGTCGAATGATACGAATGCTGGACGTCCAAGAGTAGCTTCTAC 
TGCTTCAAGGATTGCTGGAACTTTCAATGATTTTTCACGAACTGAGATCACTTGACCTGC 
AGTTACGCGGTATGATGGGATATCAACGCGTTTCCCGTCAACAAGGATGTGACCGCTGGT 
TTACAAATTGGACCAAACTTGACGACCAGTAGTCGCGAGACCAAGACGGTAAACAACGTT 
ATCCAAACGACGTTCCAAAAGAAGCATAAAGTTGAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 94 291 R 66 aa 
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>[SEQ ID NO:93] 3113716-1 ORF translation from 94-291, direction R 

VISVREKSLKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRLPERDEINPEINEALWEF 

YNKML* 

Description : 

30S RIBOSOMAL PROTEIN S4 (BS4). - BACILLUS SUBTILIS. 

Assembly ID: 3174176 
Assembly Length: 1961bp 

>[SEQ ID NO: 7] 3174176 Strep Assembly -- Assembly id#3174176 

CTAATATAGAATAATCACCGCCGTTGTGAAAGAACGATTGGATGATAATCCAATCGTTCA 

GGGAAATTGGAAGACCTTGGGTTTCCAATTTAGGCATGAGACACCTTTGGTGGCTGCTGC 

CGTCCCTCACAAGCTAAGGTGATTGTTGAAAAAGAGGAAAAAGGAGAAGAAATGAAACCA 

GTAATTTCCATCATCATGGGCTCAAAATCCGACTGGGCAACCATGCAAAAAACAGCAGAA 

GTCCTAGACCGCTTCGGTGTAGCCTACGAAAAGAAAGTTGTTTCCGCACACCGTACACCA 

GACCTCATGTTCAAACATGCAGAAGAAGCCCGTAGTCGTGGCATCAAGATCATCATCGCA 

GGTGCTGGTGGCGCAGCGCATTTGCCAGGCATGGTAGCTGCCAAAACAACCCTTCCAGTC 

ATTGGTGTGCCAGTCAAGTCTCGTGCTCTTAGTGGAGTGGATTCACTCTATTCTATCGTT 

CAGATGCCGGGTGGGGTGCCTGTTGCGACCATGGCTATCGGTGAACTCTTTTTTAGGATA 

TAAAACAGGGTTCGGATAAGTTTTTTTGCAAGGTGGATGATGGCTACATTGTAATGTTTT 

CCTTGTTCTAACTTAGTCTTAAAAGCAGGTGAAAAGTGAGGGCATGCTTTGGCAGCTTGT 

ATGAGTACCTACCGCAGATAAGGGGAACCCCGTTTGACCATCCTCCCAGCTAAATCAATC 

TGACCTGACTGATAAATAGAAGAATCCAGTCCAGCGAAAGCTTGTAATTGAGCAGGATTA 

TCAAAGGCATGAATATTTCGAATCTCGGCTAAAATGACCGCCCCTAAACGATTCTCAATC 

CCAGTAACCGTCGTGATGACCGAGTTTAACTCAGCCATCAAGTCATTGACACATTTTTCC 

GCCTTGTCAATGAGCCTCTTGTAATGTTTGATGTTTTCATTACACGAGATAAAACGTCTA 

TGCGTTATCAAACTCATTACCAATTAAAACAAATGTGGTTAGATCCTTTCGGAAATTGTC 

AAGCGATTGGAGGAAATGAACTAATCCACAGCGGCTTATTCCAAGTATACCACTTGGGCT 

TTGGCAGTAGCTAACTGCGCTAAATATAATATAAGGAGGAGTAAAATGAAGACAGTTCAA 

TTTTTTTGGCATTATTTTAAGGTCTACAAGTTCTCATTTGTAGTTGTCATCCTGATGATT 

GTTCTGGCGACTTTTGCCCAAGCCCTCTTTCCAGTCTTTTCTGGACAAGCGGTGACGCAG 

CTAGCCAATTTAGTTCAAGCTTATCAAAATGGGCAATCCAGAACTTGTATGGCAAAGCCT 

ATCAGGAATTCATGGTCAATCTTGGCCTGCTGGTTTTGGGTTCTATTTATCTCTAGGTGT 

AATATAAACATGTGTCTCATGACGCGCGTGATTGCAGAATCGACCAACGAGATGCGCAAA 

GGTCTCTTTGGTAAGCTTGCTCAGTTGACGGTTTCTTTCTTTGACCGTCGACAAGATGGC 

GATATCCTGTCTCATTTTACCAGTGATTTGGATAATATCCTCCAAGCCTTTAACGAAAGC 

TTGATTCAGGTCATGAGCAATATTGTTTTATACATTGGTCTGATTCTTGTCATGTTTTCG 

AGAAATGTGACGCTGGCTCTCATCACCATTGCCAGCACCCCATTGGCTTTCCTTATGCTG 

ATTTTCATCGTGAAAATGGCACGTAAATACACCAACCTCCAGCAGAAAGAGGTAGGGAAG 

CTCAACGCCTATATGGATGAGAGCATCTCAGGCCAAAAAGCCGTGATTGTGCTAGGAATT 

CAAGAGGATATGATGGCAGGATTTCTTGAACAAAATGAGCGCGTGCGCAAGGCAACCTTT 

AAAGGAAGAATGTTCTCAGGAATTCTTTTCCCTGTCATGAATGGGATGAGCCTGATTAAT 
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ACAGCCATCGTCATCTTTGCTGGTTCGGCTGTACTTTTGAA 
ORF Predictions: 

ORF # Start End Direction Length 



1 139 543 F 135 aa 

>[SEQ ID NO: 94] 3174176-1 ORF translation from 139-543, direction F 
VI VEKEEKGEEMKPVI S I IMGSKSDWATMQKTAEVLDRFGVAYEKKWSAHRTPDLMFKH 
AE EAR S RG I K 1 1 1 AG AGG AAH L PGMVAAKTT L P VI G VP VK S RAL S G VD S L Y S I VQMPGG V 
PVATMAIGELFFRI * 

Description: 

PHOSPHORIBOSYLAMINOIMIDAZOLE CARBOXYLASE CATALYTIC SUBUNIT (EC 
4.1.1.21) (AIR C ARBOXYLASE) (AIRC) . - BACILLUS SUBTILIS . 

Assembly ID: 3174186 
Assembly Length: 3 75bp 

>[SEQ ID NO: 8] 3174186 Strep Assembly Assembly id#3174186 
CTATCTCCAAGTNCGNTTGGAATNCCTCCGCNANCCACAACTCATCCAAGCACTTTNCAA 
CGTGNCCTGGTCCGGTCCTCCAGTGCGTCTNACNGCACCTTCAACCTGCNCATGGGTAGG 
TCACATGGCTTCGGGTCTACGTCATGATACTAAGGCGCCCTATTCAGACTCGGNTNCCCT 
AGGGCTCCGTCTCTTCAACTTAACCACGCAACAGAACGTNACCCGCCGGTTCATTCTACA 
AAAGGCAGNCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCACACNGCTTCAGGTN 
CTATTTCACCCCCCTCCCGGGGAGCANCTCAACTGACCCNCACGGCACCGGTGNANNAAA 
CGGTCACTTAGGGAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 83 283 F 67 aa 

>[SEQ ID NO: 95] 3174186-1 ORF translation from 83-283, direction F 

VRXXAPSTCXWVGHMASGLRHDTKAPYSDSXXLGLRLFNLTTQQNXTRRFILQKAXSHPL 
TGSNLL* 

Description: 
unknown 

Assembly ID: 3174374 
Assembly Length: 6 65bp 
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>[SEQ ID NO: 9] 3174374 Strep Assembly — Assembly id#3174374 
GGGGGGGGTNNNTTCTGGGGCCGGGTGNNTCCTNGAAAAAATGCTGGACTTAACGGTT^ 
ATCATTTGAATTGGCCTGTGGATTTTAGCTAGCAATCCAGAGCGAGTTTTCTCCAAGACA 
GACCTCTATGAAAAGATCTGGAAAGAANACTACGTGGATGACACCAATACCTTGAATGTG 
CATATCCATGCTCTTCGACAGGAGCTGGCAAAATATAGTAGTGACCAAACGCCCACTATT 
AAGACAGTTTGGGGGTTGGGATATAAGATAGAGAAACCGAGAGGACAAACATGAAACTAA 
AAAGTTATATTTTGGTTGGATATATTATTTCAACCCTCTTAACCATTTTGGTTGTTTTTT 
GGGCTGTTCAAAAAATGCTGATTGCGAAAGGCGAGATTTACTTTTTGCTTGGGATGACCA 
TCGTTGCCAGCCTTGTCGGTGCTGGGATTAGTCTCTTTCTCCTATTGCCAGTCTTTACGT 
CGTTGGGCAAACTCAAGGAGCATGCCAAGCGGGTAGCGGCCAAGGATTTCCCTCCAATTT 
GGANGTTCAAGGTCCCTGTTAAATTTCCCCCATTTAGGGGCAACCTTTTAATGAAAJSTTTT 
CCNTNATTTGCCGGGTANCTTTGAATCCCTNGGAAAAAACCCAACNAAAAAAAGGGCTTA 
NNCCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 154 294 F 47 aa 

>[SEQ ID NO: 96] 3174374-1 ORF translation from 154-294, direction F 
VDDTNTLNVHIHALRQELAKYSSDQTPTIKTVWGLGYKIEKPRGQT* 

Description : 

REGULATORY PROTEIN VANR . - ENTEROCOCCUS FAECIUM (STREPTOCOCCUS 
FAECIUM) . 

Assembly ID: 3174972 
Assembly Length: 989bp 

>[SEQ ID NO: 10] 3174972 Strep Assembly Assembly id#3174972 

CTACGATATCTTTGGTCTTTTGTAAGATATGAGGTCCACCCTTATGCGCCTCAGTTGGCA 

TTTCATGCGATTCAAGAAGTTGCCCCTCTTGATCAACCAAACCATACTTGATGTTGGTTC 

CACCGATATCAATTGCAACGTAATATGTCATAAATACCTCCTTTTAGATTAGAGGAAGCG 

CTCCTTGGTTTCACGAATCAAGGCAGCAGCCGCTTCTACAACTGGACGATCTTCTTCAGT 

CACTGGTGTCAATGGTGAACGAACAGATCCAATATTCAAGCCTTCATTGATTTTCAAGAC 

TTCTTTGATGACACCGTACATATTTCCATGAGCAGAAGTGAGTTTACCAATGATTGCGTT 

GATAGCATACTGCAATTCACGCGCTGTTTCTAGGTCCTTATCCGCAATCAACTGATTGAG 

TTTCAAGAAGAGTTCTGGCATAGCACCATAAGTACCACCGATACCAGCCCTAGCCCCCAT 

GAGGCGTCCTCCTAGGAACTGCTCATCAGGACCATTAAAGACGATATGGTCTTCTCCACC 

AAGGCTGACAAAGGTTTGGATATCTTGAACTGGCATAGAAGAGTTCTTCACACCGATAAC 

ACGAGGATTTTTCAACATTTCTGTGTAAAGGCTTGGAGTCAAAGCAACCCCTGCCAATTG 

AGGAATGTTGTAAATCACGTAGTCTGTGTTTGGAGCTGCAGAACTGATATCGTTCCAGTA 

TTTGGCAACTGAGTTATTCTGGCAAGCGGAAATAAATTGGTGGAATCCGTTGCAATAGCA 

40 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



TCTACTCCCAAGCTTTCAGCATGGCGAGCAAGTTCCATACTATCTTTAGTATTATTGCAA 
GCAACATGGGCAATAATGGTCAATTTACCTTTGGCTACCGCCATGACTTCTTCCAAAATC 
AACTTGCGATCTTCAACGCTTTGGTAGATACATTCACCAGAAGAACCATTGACATAAGAC 
CTTGAACACCTTTATCAATGAAGTATTGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 169 678 R 170 aa 

>[SEQ ID NO: 97] 3174972-1 ORF translation from 169-678, direction R 
VI YNI PQLAGVALT P S L YTEMLKNPRVI GVKNS SMPVQD I QTFVS LGGE DH I VFNG PDEQ 
FLGGRLMGARAGIGGTYGAMPELFLKLNQLIADKDLETARELQYAINAIIGKLTSAHGNM 
YGVIKEVLKINEGLNIGSWSPLTPVTEEDRPWEAAAALIRETKERFL* 

Description : 

N-ACETYLNEURAMINATE LYASE SUBUNIT (EC 4.1.3.3) (N-ACETYLNEURAMINIC 
ACID ALDOLAS E) (N-ACETYLNEURAMINATE PYRUVATE LYASE) (NALASE) . - 
ESCHERICHIA COLI . 

Assembly ID: 3175138 
Assembly Length: 145 Obp 

>[SEQ ID NO: 11] 3175138 Strep Assembly — Assembly id#3175138 

CTCCATATTTCTTAGCCTTCTCAATTAGGGTCTTGAAGTCTTCGACACCACCGATACGCT 

TACCAATATCAGCATAGTTCAAGTGACCAGAGTCATGGCTGTGATATCCTTAACTTTTTC 

CCAACCTTGAGGGTTGTTCATAATGCTACGATAAGCAATGGCACCATCTTGCCAATCAAC 

TTTCTTGTCTGCATTGGCATCTTCAGTGATAACAACCTTAGCACTTGGAAGTTCCTTCGT 

GTATTCTGGGAAAACAATGCCCTTATAAGCTTTTTCCCATTGCCATTCAGAGCTGTGGAT 

TCCTACATAGTTGGCATTTCCGACTGTTTCTTTATAAGCTGTCAAACGAGTCCAGTCATT 

CGAACCACCACCATAGCTATTTTGAGAGTTACTCCAAACACCAGCAGCAAGCTTATCTGT 

AGAAACAAATCCATACATGTAACCCTTAGCCAAATCCTTCATTGGATTGGTTACATCGAT 

ATGATCATCTCCGCTGACATGCGTATTGTTTGACATGGTTGCCCCATCAAACTTAGCACC 

AGTTTGATCACTAGAAACAGAGACTAAAGCATTGCCGAGGAAACTAATAGAAGAAAGTAG 

TTTTCTTTCGTCATCAATCTTTTGACCTGGAGTGACTTGATTGTGGTTGACAATCTTGGT 

CACATCAAAGTGCAATTGATTGTCCACAACTTGCAAGCGTACTGTCATTTCCGCATTGAT 

TAAGTGAGCATCATCGCGAAGCTTCATCAAGTACTCTGCTGTTGTCTCATTGATTTTTTT 

ATAAGTGACTTCAGGGGTGATTCGGTGGTTATTGATAAAGACTTGGTTGAATTGTTGCAC 

CTGTCCTGGCAAAGTATGTCCATTCAAGGTGTATCCCTTGACACGAAGGAAGGCTTGGTC 

AATTACTGCCTTAAGTACCTTAAACTGGATCGTATCATAAGTCACCTTGCTATCGTCAAC 

AACCGGACCTGTTTCTTTCTGGGCAGGGGTATCCTCTGGGTTTTACCCTCTCTGTGGCTA 

TCCGTTTCAACGCTTGAACAACTGGTCGCTCATCGTCATAAGAGCCCGCCTTGAGAAAAA 

TCTTCTTCTCATTTCTAAGATGGTCATTGACCGCAGCTGGTAGAGTCACTGTGTCAAAGA 
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AGATTGACATCCTTATTTGCCTGGCATTTACCTGACCGTCTGACTTGAAGACTGATAGAG 
AGACGGTTTGTTGATCCTGTTTCAGGAGCAGCAACACGACTACCTCTATACCAAGTGCTA 
GTTGTTGGAGATTTATACTCCCAGAACCAGCCATCCTTGTCATAACCGACAAAAACATTA 
TTATTGGTATCTTTAAATTTCAAGGAGACACCAAAGCGTGATTTGCCCTTTTCAGAATCT 
TCTTTGAAGGTTAAATCAACAGTTGCATTTCCATTGGCATCAACGGTCAAGCCCTTCTTT 
T C AAAC AG AG 

ORF Predictions: 

ORF # Start End Direction Length 



1 79 945 R 289 aa 

>[SEQ ID NO:98] 3175138-1 ORF translation from 79-945, direction R 

VTYDTIQFKVLKAVIDQAFLRVKGYTLNGHTLPGQVQQFNQVFINNHRITPEVTYKKINE 

TTAEYLI^LRDDAHLINAEMTVRLQVVDNQLHFD 

SFLGNALVSVSSDQTGAKFDGATMSNNTHVSGDDHIDVTNPMKDLAKGYMYGFVSTDKLA 
AG VWSNS QNSYGGG SNDWTRLT AYKET VGNANYVG IHSS EWQWE KAYKG I VF PEYTKEL P 
SAKWITEDANADKKVDWQDGAIAYRSIMNNPQGWEKVKDITAMTLVT* 

Description: 
unknown 

Assembly ID: 3175860 
Assembly Length: 42 0bp 

>[SEQ ID NO: 12] 3175860 Strep Assembly — Assembly id#3175860 

CTGCGAGTTGTGAGGCTCCTATTATGTCTCGTGATTAAAATCTCTATAAGGTGATTTTGG 

AGGGAAATTATCGGGCGACAGCGGGTAGAGAAGAGATGAAAGAGGCTATTTTGGAATATC 

AAGCAAATCCTGCTGCCTTAAAAGATCTCAAAGAAAAGGCTAAGAATATTTCCAGAGAGT 

ATTCTGAAGAGCATCTGTTACAAATCTGGTTGGACTTTTATGAGAAACAAGCCGCTTTAG 

GGACAAAGTAAAAAGTGAGGTAATCTATGCGAATTGGTTTATTTACAGATACCTATTTTC 

CTCAGGTTTCTGGTGTTGCGACCAATATCCCAACCTTGAAAACCCACCTTGAAAACACGG 

ACTTGCCTGCATTTNTATCTCATACAATCCACCGAATTTCGATGTCCCCCTCCCTACAAC 

ORF Predictions : 

ORF # Start End Direction Length 



1 51 251 F 67 aa 

>[SEQ ID NO:99] 3175860-1 ORF translation from 51-251, direction F 
VI LEGNYRAT AGREEMKEAI LE YQANPAALKDLKEKAKNI SREYS EEHLLQ I WLDF YEKQ 
AALGTK* 

42 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCTYUS97/19226 



Description : 
unknown 

Assembly ID: 3175918 
Assembly Length: 6 61bp 

>[SEQ ID NO: 13] 3175918 Strep Assembly — Assembly id#3175918 

CTCCCCAAACTTTTATTTGAGAGTGAACGGTATAAGAATATGAAACCGGAGGTTAAGGTG 

GTTTACTCAGTTTTAAAAGATCGGTTGGAGTTGTCTTTGAGCAAAGGTTGGATTGATGAG 

GATGGGACTATTTATTTGATTTATTCCAATTCAAATTTGATGGCACTTTTAGGCTGTTCA 

AAGTCAAAATTACTCTCCATGTGAGTTTGAAGTGACATTTTTAGATGATTACCATAAAAA 

ACATAACTACCCACTATTTTACGAATCCTATCTTCAAAACGTTATGGAATTCCTTGAAAG 

TCAAGACATAAAGAATGGGGTTGATGCCTTTGTAGATGATCATCAAAATCTCGTTTTTGT 

TTTATATGGACAAGGCTATCGAGCCGAGGGAAAAGAGGGAATACTTACAACCCAAGTAAC 

TGTAAAAGCTTATGATGAAGACAAGAAACCGATTAACTTCGCAAATTTATTAGATTCCTT 

AATCGTGTCAGAATATCAAATGGAACCGAATCTTTGGGAGGTCTCCTATGATTGATCTCT 

ATCTAAGTAAAAATAGCCGAAGAAATCAACTTCTTTTAGACTTCTTCCAAAACTATGGCA 

TCGAGGTATCTTGTCATTCAGTTTCTGAAATGACAAAGGACAAATTAATTGAGATGATGA 

G 

ORF Predictions: 

ORF # Start End Direction Length 



1 212 535 F 108 aa 

>[SEQ ID NO: 100] 3175918-1 ORF translation from 212-535, direction F 

VTFLDDYHKKHNYPLFYESYLQNVMEFLESQDIKNGVDAFVDDHQNLVFVLYGQGYRAEG 

KEG I LTTQ VTVKAYDEDKKP INF ANLLDS L I VS E YQME PNLWE VS YD * 

Description : 
unknown 

Assembly ID: 3811220 
Assembly Length: 1429bp 

>[SEQ ID NO: 14] 3811220 Strep Assembly -- Assembly id#3811220 

CTGCCCCTGTAAGGCTGGACGATTGCCTTTCTTAGTATCCGCAAAGAGGTAAACTGAGAA 

TAGAGAGGATTTCTCCTTCAATATCTTTGACAGACAGGTTCATCTTGCCTTCTACGTCTG 

AAAAAATCCGCATATTGACCAGTTTTCTCACAGCATAGTCCAAATCTTCCTCTTGGTCCT 

CTGGTCCAACACCAACCAGCAATAAAAGTCCCTGATTGATTTTTCCCTGAATCTGGCCTT 

CTATACTCACTTGGGCTTTTTTAACCCGTTGGATAATGATTTTCATAATAGCCTTTCTAG 

TAAGAGCTAGGACAACTAGCCGTTGGTCCGTTTGACAGAGTAAACTTCTGGCACACTCTT 

AATTTTATCGACAACCGTGGTCAGTGTAGAGAGGTTGGCAATACCGAAGGACACATGGAT 
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ATTAGCAAACTTCATATCCTTGGTTGGTTGGGCATTGACCGTTGAAATATTCTTGGTTGT 
ATTTGAAAGAACTTGCAGTACATCGTTCAACAGTCCTGTACGGTTGAGACCGTAGATATC 
GATATGGGCCATATACTCCTTATTTGAGCTAGAGTACTGGTCTTCCCATTCCACATCAAG 
GAGACGTTGCTCGTAGTTTTCTTGGGCACGCAGGTTCATACAGTCCACACGGTGAATAGC 
CACACCACGACCCTTGGTAATGTAGCCAACAATATCGTCACCAGGCACGGGGTTACAACA 
CTTAGCAATCCGCACTAGGAGACCAGAAGCACCTTCAATAACCACTCCCCCCTCATGCTT 
GACCTTGGAGAGTTTCTTTATTTTCAACCTTGACCTCGCCACCTTTGACAAGCTCCTCTG 
CCTCAGCCTTGGCCTTGGCACGCTCTTCCTCACGGCGTTCTTTTTCAGTCAGACGGTTAA 
AGACGGTAATCGCACCGATTTCCCCAAAACCAATGGCCGCAAAGAGGGAGTCTTCTGTCT 
TGTAACTGGTCTTTTGCAGAACTTGATCCATGTGGCGCTTGTCCATAAATTTATTTGCCA 
CATAGCCATTTTCTTGGAACTGAGCCATCAGCATCTCACGACCCTTGTTGACAGACAATT 
CCTTATCTTGGTTTTTAAAGAACTGGCGAATCTTATTGCGCGCCTTGCTAGTCTTGACCA 
TATTGAGCCAGTCACGGCTAGGTCCAAAGGAGTTCGGGTTGGCGATAATTTCAACCTGAT 
CCCCTGTCTTTAACTTGGTTGTCAGTGGAACCATGCGGCCATTGACCTTGGCACCAGTTG 
CTTTTTCACCGACCTTGGTATGGATTTCGTAGGCAAAATCAATCGGTCCTGAATCTTTGG 
GAAGAGAACGGACAGCTCCATCTGGGGTAAAAACGTAAATCTCCTCAGCCAGATAGTTTT 
CCTTAACAGAGTCCACAAATTCCTTAGCATCATCAGCCTGGTCTTGGAG 

ORF Pr edi c t i oris : 

ORF # Start End Direction Length 



1 316 873 R 186 aa 

>[SEQ ID NO:101] 3811220-2 ORF translation from 316-873 ,' direction R 

VRKSVPRPRLRQRSLSKVARSRLKIKKLSKVKHEGGWIEGASGLLVRIAKCCNPVPGDD 

IVGYITKGRGVAIHRVDCMNLRAQENYEQRLLDVEWEDQYSSSNKEYMAHIDIYGLNRTG 

LLNDVLQVLSNTTKNISTVNAQPTKDMKFANIHVSFGIANLSTLTTVVDKIKSVPEVYSV 

KRTNG* 

Description: 

stringent response-like protein - Streptococcus eqoiisimilis 

Assembly ID: 3811436 
Assembly Length: 1513bp 

>[SEQ ID NO: 15] 3811436 Strep Assembly Assembly id#3811436 

CTCTGCAATGATGTACTCAAACATCTCCGCTTCTAGTTCCTCCTTAGGCAGAGGCAATTT 

CCCACGTCGCATCCGGTTCATAAAGACCGTATGGTTTTCTAAAATCAAACTATACAAACT 

CATGTGGGGAATATCCAATCCAATGGCTTTAGCCACATTTTCCTTTACTTGCTCCATGGT 

CTGACCAGGCAGAGCATAAATCAAATCAATGGAGATGTTGTCAAAACCAGCCAGTTTCAG 

GCGATCGATATTTTCATAAATATCCTTCTCCAAATGACTGCGCCCAATCTTTTTCAACAT 

CTTATCATCAAAGGTCTGGACACCTAGCGAAACACGATTGACAGCCGAATTTTTCAAAAC 

AGCTATCTTATCCGCATCCAAATCGCCTGGATTGGCTTCAATGGTCAACTCTTCCAAGAC 
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AGACAAATCCAAGTTTTTAGTCAAGCCATTCAGTAACACCTCCAGTTGCGGAGCCGACAG 
GGCTGTCGGTGTTCCACCACCGATATAAAGGGTTGACAACTTTTCAATATCATAAGAACG 
AAACTCTTCCAGCAGATGCTCTAAATAGCTGTCGACTGGCTGATTTTTGATGAAGACCTT 
TGAAAAATCACAATAATAACAAATCTGGGTACAAAATGGGATGTGCACATAGGCTGACGT 
TGGTTTTTTCTGCATAGTAATTATTATACCACAAAGACTAGATTCCAGATAAAAATCACC 
ATCCCCAGATACATAGTCCGTCCGGAGATGGTGATGGTTTATTCTTCTGTTATATCAATC 
ACAATCTCTTCTGAGTCATCAAGAGCTTCGGCTTTTTCTTGCCATTGTTCCTTGAGATTA 
TTTAATTGATTTTTTGATGCTTCTGTCGCTTGAAAAGCATAGGATTTAGCTTGAGCAAGT 
ATACTGTCCACAGTGATTTCACCTGACTCAACCTGTTCTTTTGTTTTCAGAACAAAATCT 
GTAGCCTGCTCCTTAACTTCTGTCAGTTTTTCACAGACTTGCTCCTTGGCATACTCCGGA 
TCTTCTCTCAAATCATCTAAAAAATCTTGAGCCTGACTGCAAACTTGTTTGCCCTTATCA 
CTTGTTAAAAACAAGGCAAGAGCTGCACCTGAAACGGTTCCTAAAAGGATTGAGGATAAT 
TTACCCATAAGGATTCTCCTTTTTTATTTTTTGAAAAATTTACTTGCAAGACGAAGAGCT 
GACAGACTTGCACCAGTCTTGAGTGTTTTTGAACCAGCTGATGAAGCTTTCTTGCTCAAG 
ACACGCGCATGGTCATTGAGGTCTGAAACAGATAGAGATAAATCTGCAACAGCACTGAAG 
AGTGGATCAATCGTAGCCACCTTGACATTGATATCATCTGCCAAGACATTGACCTTAGCC 
AACAACTCATTGGTGTGATGCAAGGTCACATCCACATCTGAAGTCAAGGTTTTAATCGTC 
TTTTCTGTTTCATCGATGACACGACCAAGCTTTTGTACAGTAATGATCAGATAGACCAAA 
AAGACAATCACAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 1164 1511 R 116 aa 

>[SEQ ID NO:102] 3811436-3 ORF translation from 1164-1511, direction R 

VIVFLVYLIITVQKLGRVIDETEKTIKTLTSDVDVTLHHTNELLAKVNVLADDINVKVAT 

IDPLFSAVADLSLSVSDLNDHARVLSKKASSAGSKTLKTGASLSALRLASKFFKK* 

Description: 
unknown 

Assembly ID: 3811984 
Assembly Length: 5 05bp 

>[SEQ ID NO: 16] 3811984 Strep Assembly — Assembly id#3811984 
CTCTTGTCAGAGAAATTTACAAAACGTTAGGAGAATAAGATGGCATTTATTGAAAAAGGT 
CAAGAAATCGATATGGAAGTCATCAAGGCTGAAACCCAATTGTCTGCAGAAGCCTTGAGA 
CTCAAGGAAAGCCGTGACAGGGAATTGGCAGATATTATTTCAGGGGAAGATGACCGTATT 
CTCTTGGCTGATTGGTCCTTGCTCTTCTGATAATGAAGAGGCGGTCTTGGAATATGCTCG 
CCGTTTATCCGCCTTGCAAAAGAAGGTAGCGGATAAGATTTTCATGGTCATGCGCGTGTA 
TACTGCTAAGCCTCGTACCAATGGAGACGGCTATAAAGGGTTGGTTCACCAGCCAGATAC 
TTCTAAGGCTCCAACCCTGATTAACGGCTTGCAGGCTGTGCGCCAGTTGCACTACCGCGT 
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TGATTACAGAGACTGGTTTGACAACGGCAGATGAGATGCTTTATCCGTCAAATCTGATCT 
TGGTGGATGACTTTGGTCACCTACC 

ORF Predictions: 

ORF # Start End Direction Length 



1 134 454 F 107 aa 

>[SEQ ID NO: 103] 3811984-2 ORF translation from 134-454, direction F 
VTGNWQ I LFQGKMTVF S WL I GPC S SDNEE AVLEYARRL SAL QKKVADK I FMVMRVYTAKP 
RTNGDGYKGLVHQPDTSKAPTLINGLQAVRQLHYRVDYRDWFDNGR* 

Description: 

PHOSPHO-2 -DEHYDRO-3 -DEOXYHEPTONATE ALDOLASE, TYR- SENSITIVE (EC 
4.1.2.15) (PHOSP HO-2 -KETO-3 -DEOXYHEPTONATE ALDOLASE) (DAHP 
SYNTHETASE) ( 3 -DEOXY-D-ARABINO-HEP TULOSONATE 7 -PHOSPHATE SYNTHASE) . 
- ESCHERICHIA COLI . 

Assembly ID: 3857228 
Assembly Length: 1827bp 

>[SEQ ID NO: 17] 3857228 Strep Assembly -- Assembly id#3857228 

CTCTTTTAACCGTTTTAGCGGTGACACCGAGGATTTTTTCAGGACCCAAGACTTGTCGGG 

CAACCGAAACTGGGAGTTCGTCATCTCCAATATGCAGACCAGCAGCATCAACCGCAAGAC 

AAACATCCAACCGATCATCGATTATCAAGGGGACCTGATAGGCATCTGTTATTTCCTTGA 

CTTGTTTTGCCAGTTGATAATATTGATTGGTTGTGAGATTTTTTTCTCGCAATTGGACTA 

TGGTAACCCCTGAACGGCAGGCCGTCTCAACTTTTGCAAGAAAGCTTTCCACGGAATCTT 

GATAGCGATTGGTTACCAGATATAGTCTAAGCGCTTCTCTATTCATAAACCTCTCCTTTG 

ATGGTATCTAGCCAATTTTCATCTCTTCTTAGGAGCGAAAGCTGATTGAGTACTTGGTAA 

CGAAATTCTTCCAATCCCATTCCTTGAACAACTATTTTCTCAGCAGCGATATTGAGATAA 

GAGACTGCTAAGCAAGAACTTCAAAACCAGTCTTTCCTTGGCTGAGAAAAACAGCTGTTA 

AGGCTCCAACCAAGTCTCCTGTCCCTGTTATCCAGTCTAATTCAGTACAGCCATTCTCAA 

GTACAGCAACTTGATTCTCCGAAACAATAAGGTCCTTGGGACCTGTGACTAAGAATGACA 

TACCACGATAGGTCTGACACCAGTCTTTCAAGACTTGAAGCAAATCCTCCGTTTCTTGAT 

CTTTAGCACTCGCATCGACCCCAACGCCGTGATGCTTTAATCCAACAAGACTTCGAATTT 

CTGACATGTTTCCTTTAAGGACCGTAGGTCTATAGTCTAAAAGGTCTTTAACTAAGCTCT 

TACGAATGGATGAAGTCGTTACGCCAACCGCATCTACTACCATCGGGAGAGAAGATTGGT 

TTGCATACAAAGCTGCCATGCGGATTGCTTTTTCCTTCTCAGCTGACAAATGCCCCAAAT 

TGATGAAGAGAGCCTGGCTTTGCTTAGTAAAATCAAGAACTTCACGGGGATCATCTGCCA 

TGACAGGTTTGCATCCCAGAGCCAAAATCCCATTTGCCAGCATCTCACAAGAAATCTCAT 

TGGTCATACAGTGAATGAGGGAACTAGAGCCTATAGGAAAAGGATTTGTCAATGCCTGCA 

TCATTCTATCCTTTCAGCAAAGAAATATCCTTGCACTTTTTTAAAGAATTCCTGCTTGAT 

TAAAAATCTAAATGCAATAAAGGAAATCGCTGTACCAATCAAGGTTGCTCCGAAAAATCG 
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AGGCGTGTAGATAAACCAACTAAGCTTAGCAGCCGATCCTGTAAAGAGCACCATAACAGG 
ATAGGAAACAATAGAACCAATAATACCTGTTCCCACAATTTCTCCCAAGGCAGAAAAGTA 
AAATTTTCGACCGTACTTATAAAAGAGACCTGCTAGAAGGGCTCCAAAAGTCGCTCCTGT 
GAGAGATAAAGGAGCTTATCGGAATACCCTTGAGTCGTCATACGGATAAAGGCTGTCACT 
GTAGCCATAGCCAAGGCATAAACAGGTCCCATCATGATTCCCGCTAGAATATTGACTACA 
CTGGACATCGGTGCCATTCCCTCAATCCGAAAGATAGGTGTAAGGACTACATCAAGGGCA 
ATCATCATAGATAAAATGGTCAATTTGTGAACTTGTAGTTGGTGCTTTCTCAAGTTTCTA 
TTCTTCTCCTTTTTCTAAAGACTGTAAATCGCTCTTCCATGTCTGGTGTTGGTAAGCCAT 
CTCCCAAAACTTGGCTTCCATATGAACACTGATGTGGAAGGCATCTAGCATTTTTTGCTT 
ATCTGTCTCATCACTTTCTCGATAGAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 1141 1356 R 72 aa 

>[SEQ ID NO:104] 3857228-2 ORF translation from 1141-1356, direction R 

VGTGIIGSIVSYPVMVLFTGSAAKLSWFIYTPRFFGATLIGTAISFIAFRFLIKQEFFKK 
VQG YF F AER IE* 

Description: 
unknown 

Assembly ID: 3857842 
Assembly Length: 4 85bp 

>[SEQ ID NO: 18] 3857842 Strep Assembly -- Assembly id#3857842 

CTATTGCCAATCCATATAGCCTATCAGGTGGTCAATAACAACGTGTGGCCATCGCTCGTG 

GCCTATCAATGAATCCAGACATCATGCTCTTCGATGAACCAAATTCTGCCCTTGACCCTG 

AGATGGTTGGAGAAGTAATTAACGTTATGAAGGAATTGGCTGAGCAAGGCATGACCATGA 

TTATCGTAACCCATGAGATGGGATTTGCCCGCCAGGTTGCCAACCGCGTTATCTTTACTG 

CAGATGGCGAGTTCCTTGAAGACGGAACACCTGACCAAATCTTTGATAACCCACAACACC 

CTCGTCTGAAAGAGTTCTTAGATAAGGTCTTAAACGTCTAAACTCAAACTGCAAGGATTT 

CCTTGCAGTTTTTCTACCTCGTATTGGAATTTTTGATTTTTCGGAAAATTATGTTAGAAT 

TAAGTTTATGAAATGAGGTTTCCTCATACCTAGCAAGACTAGGAATAAAAATAGAAATTA 

GGTAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 45 341 F 99 aa 

>[SEQ ID NO: 105] 3857842-1 ORF translation from 45-341, direction F 
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VAIARGLSMNPDIMLFDEPNSALDPEMVGEVINVMKELAEQGM 
RVI FTADGEFLEDGTPDQI FDNPQHPRLKEFLDKVLNV* 

Description: 

GLUTAMINE TRANSPORT ATP -BINDING PROTEIN GLNQ . - BACILLUS 
STEAROTHERMOPHILUS . 

Assembly ID: 3857996 
Assembly Length: 1547bp 

>[SEQ ID NO: 19] 3857996 Strep Assembly — Assembly id#3857996 

NTCTTGGGCNCNGGGCGNNTCCTTTGAGGACNACGGTATCGATGACCTTGATCTCAAGTG 

CAAGCAGTATCTGAATCTGCAGCAGCACCTGTCCGTGCAAAAGTTCGTCCAACATACAGT 

ACAAACGCTTCAAGTTATCCAATTGGAGAATGTACATGGGGAGTAAAAACATTGGCACCT 

TGGGCTGGAGACTACTGGGGTAATGGAGCACAGTGGGCTACAAGTGCAGCAGCAGCAGGT 

TTCCGTACAGGTTCAACACCTCAAGTTGGAGCAATTGCATGTTGGAATGATGGTGGATAT 

GGTCACGTAGCGGTTGTTACAGCTGTTGAATCAACAACACGTATCCAAGTATCAGAATCA 

AATTATGCAGGTAATCGTACAATTGGAAATCACCGTGGATGGTTCAATCCAACAACAACT 

TCTGAAGGTTTTGTTACATATATTTATGCAGATTAATTTACAGAGGGACTCGAATAGAGC 

CCTCTTTTCAGGTTTTACCGTGACAATCCCTATTAAAAATTATATCAAAATCGTGAAAAT 

ATTGGAAAAGTATGGTAGAATGAAAATTGTCGTGTGAACGATAATACTCATTCTTGATGA 

ATTGTGAAGCAGTTGCCCTTGGGTCGTTTTGCGAGTTGAAGTCAAGAAGAGGAAAAAAAC 

AAAAAGGAGAAATACTCATCGAATTTCAATGAAACAACTTCTTGAGGCTGGTGTACACTT 

TGGTCACCAAACTCGTCGCTGGAATCCTAAGATGGCTAAGTACATCTTTACTGAACGTAA 

CGGAATCCACGTTATCGACTTGCAACAAACTGTAAAATACGCTGACCAAGCATACGACTT 

CATGCGTGATGCAGCAGCTAACGATGCAGTTGTATTGTTCGTTGGTACTAAGAAACAAGC 

AGCTGATGCAGTTGCTGAAGAAGCAGTACGTTCAGGTCAATACTTCATCAACCACCGTTG 

GTTGGGTGGAACTCTTACAAACTGGGGAACAATCCAAAAACGTATCGCTCGTTTGAAAGA 

AATTAAACGTATGGAAGAAGATGGAACTTTCGAAGTTCTTCCTAAGAAAGAAGTTGCACT 

TCTTAACAAACAACGTGCGCGTCTTGAAAAATTCTTGGGCGGTATCGAAGATATGCCTCG 

TATCCCAGATGTGATGTACGTAGTTGACCCACATAAAGAGCAAATCGCTGTTAAAGAAGC 

TAAAAAATTGGGAATCCCAGTTGTAGCGATGGTTGACACCAATACTGATCCAGATGATAT 

CGATGTAATCATCCCAGCTAACGATGACGCTATCCGTGCTGTTAAATTGATCACAGCTAA 

ATTGGCTGACGCTATTATCGAAGGACGTCAAGGTGAGGATGCAGTAGCAGTTGAAGCAGA 

ATTTGCAGCTCCAGAAACTCAAGCAGATTCAATTGAAGAAATCGTTGAAGTTGTAGAAGG 

TGACAACGCTTAATTTATACAAATAGTAATTACCTAGGAGGGCGGGGCTTAGCCCGGCTC 

TCCTATTTTCAAAAAATATAGGAGAATTAAAATGGCAGAAATTACAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 58 456 F 133 aa 
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>[SEQ ID NO: 106] 3857996-1 ORF translation from 58-456, direction F 
VQAVSESAAAPVRAJKVRPTYSTNASSYPIGECTWGWTLAPWAGDYWGNGAQWATSAAAA 
GFRTG S T PQVGA I ACWNDGG YGHVAWT AVE STTR I Q VS E SNYAGNRT I GNHRGWFNPTT 
TSEGFVTYIYAD* 

Description: 
unknown 

Assembly ID: 3858236 
Assembly Length: 740bp 

>[SEQ ID NO:20] 3858236 Strep Assembly — Assembly id#3858236 

CTATAAAAAAAAGGGTAACCAGTATGGAGGATGAATGTCTGGAACTATCTGAGAATCTCG 

GATTTTGGAAATCAGACCGATCATCATGAGATAAGGAAGGAAAGCACTTGTAAAAAGCAC 

TGTAACCACGCCAGTCCCCTGTCCCAAGAGGGTGAGGTGGTAGCGTAAAACCATGCGGAA 

AAATCCCTTTTTAGTGGTTGAAATTCTCTCCTTGCTGCGACGTTCTTTTTTGACCTTCTC 

CTCACTATTAAGCAGGATCACGTCATAAAAACGAGGAAGGACCTTCTTTTTGGTCAGATA 

AAGCAGGAAGAGAGTTAGTCCTATCCAAGCGAGCAGACCCAATATGGCTTCTATTGAAAA 

AGGCTCCACTGCTATTTTGTAAAAGATATGAAGAGGATAAAGGAGAAATGGAATGTCTCT 

AACTTTGTCAACAATACTTCCAAAAGTCGACTGAAGAAAGAAGATAAATATTAAAGGTAT 

GAGAACTCCTATCCCAATCATCACATTCGAAAAAATAGACTGATACTTTCTGAAGACCCT 

AGTCTGAGCCAAGAAATGTACTGCCACTACCGTCACTAAAGTAACAGAGACAAATAATAA 

GGTCAAGGACAGTAGCATCAAAGGCAAACCCAGCCAAAGAGAAGGAGCTAGACTAATATA 

GAGGGCTAGAAAATAAGCTAGGATTGGTACAATTCCAGTTAGAGCTGGCAAGAGGACAGA 

CAGTCCTTTAGCAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 1 261 R 87 aa 

>[SEQ ID NO: 107] 3858236-1 ORF translation from 1-261, direction R 

VILLNSEEKVKKERRSKERISTTKKGFFRMVLRYHLTLLGQGTGWTVLFTSAFLPYLMM 

IGLISKIRDSQIVPDIHPPYWLPFFL* 

Description: 
unknown 

Assembly ID: 3858264 
Assembly Length: 2219bp 

>[SEQ ID NO:21] 3858264 Strep Assembly -- Assembly id#3858264 
ATCGAATTCGTTTTGCAAGTGGCGAAATGCGAACCACGTTTGTGTCTTTATAAGTTTCCA 
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CGTCTTCTTTGTGGACACGACCGTTTGCACCTGAGCCAGAAACGTCGTAGAGGTTTATCC 
CTAAATCATCCGCTAACTTTCTAGCTGCAGGAGTCGCTCTTAGCTTGTCATCAGCCATGA 
CCTCTCCAATTCTATTTATGATACAAAGGGCGTCAAAAGCGACTGAAAAATAGGAAATCG 
ACGATGGCTTCGATGAAGCCAAGGAGATTTATCTTTTTTTCCAAGCTTTTAGCCCGTGCT 
CTAATCTAAGATATTAAGGACGAAGAGCTCTGCACCTAAAAGATACAAAGTTCTCGTCAG 
CTTTGTTTTATTTACATAACTTATCTTATGTAACTCTATTCTTTGTTATAAGTTTTTCGG 
ATTGCATCTTTGATACTTTCAACTGTTGGAATCATTGCACATTTTTAGGTTTTGCGCATA 
AGGCATCGGCACATCTTCTCCTGCACAACGGCGGATTGGTGCATCTAGATAGTCAAATGC 
TTCTGATTCTGAAATAATAGCTGAAATTTCACCGATATAGCCACTTGTTTTGTGGGCATC 
GTTGACCAGAACAACCTTACCAGTCTTCTTCACTGAGTTTATGATGATATCCTTATCAAG 
CGGAACAAGGGTACGTGGGTCAACAATTTCAACTGAAATTCCTTCTTCAGCTAATTCTTC 
AGCAGCTTGAACCACACGGCGAAGCATTTTTCCATAAGTGACAACTGTTACATCCGTTCC 
TTGGCGTTTGATTTCACCAACCCCAAGTGGAATTGTGTAGTCTGGATCAACTGGCACTTC 
CCCTTTTTGGTTAAATTCTGACTTGTACTCAAGTATAATAACTGGGTTGTTATCACGGAT 
AGAAGACTTAAGCAGGCCTTTCATGTCCGCAGGTGTTCCAGGTGCCACAACCTTAAGCCC 
TGGAATGTGAGTAAACCAAGACTCTAGAGATTGTGAGTGCTGGGCGGCAGAGCCAACTCC 
GTTACCAGCTGCACAACG7VACAGTCATTGGAACCTGACCTTTACCACCAAACATGTAACG 
TGTTTTAGCAGCTTGGTTGACGATATTGTCCATGGCAATAACAGAGAAGTCCATGAAGGT 
CATATCGACGATTGGACGAAGTCCTGTCATGGCTGCTCCTGCTGCAGCTCCAGAGATGGC 
AGCTTCAGAAATCGGACAGTCACGGACACGTTCTGGACCAAATTCTTCAAGCATTCCAAC 
AGAAGTACCGAAGTCTCCTCCGAAGACACCGACGTCTTCTCCCATCAAGAACACATTTTC 
ATCGCGAACGCATTTCCTCAGACATAGCAAGGATAATGGTGTCACGGAAGGACATTGTTT 
TTGTTTCCATTTTATCTCTTTCTCCTTAGTCTGCGTAAATATCTTCAAAGGCTGATTCAA 
GCGGTGGGAATGGGCTTTCCTCTGCAAATTTAACAGAAGCTTCTACTGCTTCCTTTACTT 
GCGCTTGGATTTCTTCCAATTCTTCGGCACTTGCAATGTTATTTTCAATAAGGTAATTGC 
GGAGGTTTTCGATTGGATCTTTTTGTTTCCACAATTCCACTTCTTCACGCGTACGATATT 
TACCAGGGTCAGATGATGAGTGACCGAGCCAGCGATAAGTTACACTTTCAATCAAGACTG 
GACCATTGCCACTGCGAACATGGTCTATAGCTTTCTGAAATCCTTCATAGACATCGATGA 
CATTGTTACCGTCTTCGATGAACATTCCAGGAATTCCATAAGCGGCGCTACGTTGATGGA 
TATGTTCTATATTGGTCATTTTCTTGATATCCGCAGAGATACCGTAACCGTTGTTAATGC 
AATAGAAAATGACTGGCAGGTTCCAGATAGAAGCCATGTTCACTGCTTCGTGGAAAACAC 
CTTCATTGGTCGCACCATCTCCAAAGAAGCAGACAACGATTTTACCGGTATTTTGCATTT 
GCTGACTGAGGGCTGCACCGACAGCGATCCCCATACCACCACCTACGATACCATTGGCAC 
CAAGGTTCCCAGCATCAAGGTCAGCGATATGCATAGATCCACCTTTCCCTTTACAGGTTC 
CAGTGTATTTACCAAGGATTTCAGCCATCATTCCGTTGAAGTCAATCCCTTTAGCAATAG 
CTTGCCCGTGTCCACGGTGGTTTGAGGTAATCAGATCATCTGGATTGAGAGCTACATAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 439 1365 R 309 aa 

>[SEQ ID NO:108] 3858264-1 ORF translation from 439-1365, direction R 
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VTPLSLLCLRKCVRDEWFLMGEDVGVFGGDFGTSVGMLEEFGPERVRDCPI SEAAI SGA 
AAGAAMTGLRPIVDMTFMDFSVIAMDNIWQAAKTRYMFGGKGQVPMTVRCAAGNGVGSA 
AQHSQSLESWFTHIPGLKWAPGTPADMKGLLKSSIRDNNPVIILEYKSEFNQKGEVPVD 
PDYTI PLGVGEIKRQGTDVTWTYGKMLRRWQAAEELAEEGI SVEI VDPRTLVPLDKDI 
I INS VKKTGKVVL VNDAHKT S G Y I GE I S A 1 1 SE S E AFD YLDAP I RRC AGEDVPMP YAQNL 
KMCNDSNS * 

Description: 

2-OXOISOVALERATE DEHYDROGENASE BETA SUBUNIT (EC 1.2.4.4) (BRANCHED- 
CHAIN ALPHA -KETO ACID DEHYDROGENASE COMPONENT BETA CHAIN (El)) 
(BCKDH El -BETA) . - BACILL US SUBTILIS. 

Assembly ID: 3858610 
Assembly Length: 107 8bp 

>[SEQ ID NO:22] 3858610 Strep Assembly Assembly id#3858610 

CTAACCCTNGACGGGGCCGCTATCATCAGTCAAACAGCTAAAAATCTTGTCTGCAAAAGT 

CTCGATTAACTGAGCTTTTACAAAAGCCGTATTTCCTGGAATAACTTGGAGATTGATCAT 

CTTATCCATCAATTCAGCCGATTCGATATTGTCTTCAGCCAGTTGCAGACTTTTTACGAT 

TGATTTTGGCAATTCGTAGACATAGGTGTTGTCTCTCAAAGGAATTTTGACAATACCTAA 

CTCTTTGATATCTCGGGATACCGTCGCCTGAGTGGCAGTGATACCTGCTTCTTTCAAATG 

TTCTACAATTTCTTCTTGCGTGCCGATTTGATAATCTGTCACCAATCTTCTAATTTTTTC 

AAGTCTCTCTTTTTTATTCATTTTTAAATTGACTATGCGCCCTCTCTACTGCTTCTTTAA 

TCTCAGCAAGAATCTGATTGCTTGCTGACTTTTCTTTTTTCAAATACACTAAAAATTCAA 

TATTTCCATGTCCACCTTGGATGGGAGAAAAGTCCAAGCCAAGGACTGAAAAACCTGCCT 

CTACTGCCATAGCTGTTACAGATTCAAGGACATTCTGATGAATCTTAGCATCTCGAATAA 

TTCCATTTTTCCCAATCTGCTCACGTCCTGCCTCAAACTGAGGTTTGACAAGTGCTACCA 

CCTGACCTTGATCAGCCAAGACACGGTGCAAGGCTGGCAAAATCAGACTAAGGGAAATGA 

AACTCACATCAATACTGGCAAAGCTCGGCTCCTGCTCGAAATCAGTCTTTTCAGCATAGC 

GGAAATTGAACTGCTCCATGCTGACAACTCGTGGGTCTTGGCGTAATTTCCAAGCCAACT 

GATTGGTACCAACATCGACTGCAAAGACCAACTTGGCACTATTCTGTAGCATGACATCGG 

TAAAACCTCCAGTAGAGGCCCCGATATCAATCGTAGTCGCGCCATCCACCGACAAATCAA 

AGACCTGCAAGGCCCTTTTCCAGTTTCAAACCACCACGGCTGACATACTTGAGTTTCTCC 

CCCTTGAGTTTTAATTCGGTGTCATCTGGAATTTCTCTCCTGGCTTGTCAAACCGTTC 

ORF Predictions: 

ORF # Start End Direction Length 



1 374 949 R 192 aa 

>[SEQ ID NO:109] 3858610-2 ORF translation from 374-949, direction R 

VDGATTIDIGASTGGFTDVMLQNSAKLVFAVDVGTNQLAWKLRQDPRWSMEQFNFRYAE 

KTDFEQEPSFASIDVSFISLSLILPALHRVLADQGQWALVKPQFEAGREQIGKNGIIRD 
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AKI HQNVLE S VTAMAVEAGF S VLGLDF S P I QGGHGNI EFLVYLKKEKS ASNQ I LAE I KE A 
VERAHSQFKNE* 

Description: 

cytotoxin/hemolysin ORF2 tly - Serpula hyodys enter iae 

Assembly ID: 3858716 
Assembly Length: 92 8bp 

>[SEQ ID NO:23] 3858716 Strep Assembly Assembly id#3858716 

ACTTTCCTGACCTCTGTTTCCAAATAATCTTCCAAATGGACAGAGATCTACCGTTGTTTG 

CATCGATAGCTGAGGTCTTTTTTAGAAAATACCATCACTTTTAGAAAATATAAACACATT 

TTTCGGATAAGATTAAGGTTAAAAGCAGCTCGTTTATCCAGGGTCTGATGATGGTCTTCA 

CGATAAACCACATCCAATAACCAATGCATACTTTCTGCTGACCAATGACCTCGAACACTA 

TGGCAAAAGGTCATCAACATCAAGCTTAAAGTTAAAGATAAAATAGCGAACGTCTTGACT 

TGTAATACCATCTCTATCAATAGTATTACGAGTCATTCCAATTCCACGCAATTTATGCCA 

TTTGGGATGGTTTTGACACAACCACTTAACATCAGAAGACACCCAGTATTCTCGAACTTC 

AATCTATCCTCTTTCTATATTCTAACTGAAAGGACAATTCAATGATTCATTTAATAATGA 

TTAGCGCCATTGCTCTAGCCATTGGAATTGGTTACCGCACCAAAATCAATATTGGCCTGC 

TGGCTATTGCTTTTTCTTACCTCATCGCAACCACTCTCATGGGATTAAGTCCCAAAGAAC 

TTCTTCATTTTTGGCCAACCTCACTCTTTTTTACCATTTTTAGCGTCTCTCTCTTTTATA 

ACGTTGCAACAACTAACGGTACTCTTGATGTTTTGGCTCAACACATTCTCTACCGCACAC 

GCACCCACCCTAACGCCCTCTACATGATTTTATACCTGATGGCAACCCTTTTGTCTGCTT 

TAGGTGCTGGATTTTTCACTACTATGGCCGTTTGCTGTCCTCTAGCGATTACCCTCTGTC 

AAAAAGCGGACAAACACCCTTTGATTGGAGTCAAAGCGTCAATGGGAACTTCAGGAAGGG 

TAATTTGATAACCAAAGGAATAAAATTT 

ORF Predictions: 

ORF # Start End Direction Length 



1 238 402 R 55 aa 

>[SEQ ID NO: 110] 3858716-1 ORF translation from 238-402, direction R 
VS S D VKWL C QNH PKWHKLRG I GMTRNT I DRDG I T S Q DVR YF I FNFKL DVDDLL P * 

Description: 
unknown 

Assembly ID: 3859124 
Assembly Length: 847bp 

>[SEQ ID NO:24] 3859124 Strep Assembly -- Assembly id#3859124 
AAAAACGCACCATATCAAAAACTAAAAAGTTTGATATCATGCGTCATGTCTTAAACTAAT 

52 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



TGACTATACTTTCTATTCAAATGAGCTTTTAACCAATTGATTGAGCCAATCCACTCTTAA 
AACCAAAGGAGCAATTTCTCGGCTTAGCTGACTCTTCTCGGAATCTGAACCATGTACAAC 
ATTTTGGATAATCTCATTTTCTCCAGCAGCTTTTGCAAAATCACCTCGAATAGTGCCTGG 
TAAAGCTTCTTCTGGACGAGTTGCACCCATCATGGTCCGCCAAGTTTCGATTACTTTGGG 
ACCAGAAATGACACCCACAAGAACTGGACCTGAAGTCATGAATTCACGAATCGGTGGGTA 
AAAACTCTGACCAACCAAGTCCTGATAGTGCTGGTCAATCAACTCTTCTGAAAACCTGTG 
AACGAAACTCCAATTTTTCGATTGTAAATCCACGTTGTTCGATGCGCTTTAACACTTCAC 
CCACTAGCCCTCTTTTTACACCATCTGGTTTGATGATAAAGAATGTTTGTTCCATACCCG 
TCTCCTTTGTCAGCTTCTTTCTTTTATTTTACCACATCTCGTGGAAAAATGGAGAAAGTT 
TTCAGAAGAGAGAATGAGAGAACCCTCGGGTTCTCTCATTCTCTCTTATTCTACTGTTTC 
TTCCACAGTGTCAACGGCAGTATCCACAACTACTTCTGTTGTTTCTTCATTTCCTTCTTC 
CTCTACTGGAGGATTAAGGTATTCTTCTTCGTTGACAGCATGTGGTTCAAGGTTACGGTA 
ACGGGCCATACCAGTACCAGCTGGGATGATCTTACCGATGAATAACATTTTCCTTTAAAT 
TCCAAGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 73 453 R 127 aa 

>[SEQ ID NO: 111] 3859124-1 ORF translation from 73-453, direction R 
VDLQSKNWSFVHRFSEELIDQHYQDLVGQSFYPPIREFMTSGPVLVGVISGPKVIETWRT 
MMGATRPEEALPGTIRGDFAKAAGENEIIQNVVHGSDSEKSQLSREIAPLVLRVDWLNQL 
VKSSFE* 

Description: 

NUCLEOSIDE DIPHOSPHATE KINASE (EC 2.7.4.6) (NDK) (NDP KINASE) 
(ABNORMAL WING DI SCS PROTEIN) (KILLER-OF- PRUNE PROTEIN) . - 
DROSOPHILA MELANOGASTER (FRUIT FLY) 

Assembly ID: 3859244 
Assembly Length: 57 8bp 

>[SEQ ID NO:25] 3859244 Strep Assembly Assembly id#3859244 

ACAACCTAACTACCGNCTAATTCAGCGCGAACTTCTGCAGTAGCTGCTTCAACAACTTCA 

CGACGTGAAAGGATGAAGCGGTTTTCTTTAGCGTTAACTTCTTTGATTTTAGTATCAAAT 

TCTTGACCTACAAAACGCTCAGCGTTACGTACGAAACGAGTATCCAACATTGAAGCTGGG 

ATAAATCCACGAACACCTTCAAATTCTACTGAAAGTCCACCTTTAACGGCACGCGTTCCT 

TTAACAGTAACAACTTCTTCTTCGCGACCAACAAGTTTGTCCCATGCTTTGCGAGCTTCA 

AGGCGTTTTTTAGATGACAAGGTATGTAACTGTATCAGTATCTTTACCAACTACTTGACG 

AAGTACAAGAACATCCAATACTTCTCCTACTTTAACAAAGTCATTGATATCTGCATCACG 

ATCGTTTGTCAATTCGCGAAGAGTCAAGACACCCTTCAACACCAGTTCCCAGAAGAATGC 

AACGTTAGCTTGAGTCGCATCAACTGTCAATACTTCAGCACTAACACATCACCAGTCTCA 
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ACTTGACTNACGCTATTGAGCANATCTTCAAATTCGAT 
ORF Predictions : 

ORF # Start End Direction Length 



1 310 462 R 51 aa 

>[SEQ ID NO: 112] 3859244-2 ORF translation from 310-462, direction R 
VLKGVLTLRELTNDRDADINDFVKVGEVLDVLVLRQWGKDTDTVTYLVI* 

Description : 
unknown 

Assembly ID: 3859250 
Assembly Length: 888bp 

>[SEQ ID NO:26] 3859250 Strep Assembly — Assembly id#3859250 

GTAGTTATAGTAGGGGTCGGATTGAAATGCCACNGCGCTTCTTGGAGTTTCTGATACCGT 

TTAAAATAGCGTTGGGCATTCTGGTTGGGAGTCAGAGCCTTATCAAGCGCAATCATGATA 

GGTTGGTTGGTATAGTAGTTGTCTAGGATAACCTGGTTCTTGGTCGTTAGGCACCTGGTG 

GAGGAAGGTTGTCAGCAATTCTCCTTTTTGACGAAATTCTTCAGCGTTGTCTGTCGCCAG 

TAACTATTTTTCCTGTTTTTTGAGTTTGTGTCGGTTTTTCTGAAGTTCATTTTCAACACG 

ACGAATCAGTTCACTGGCCTGCTGTTTGACGCGGTCGCGCTCAGCCTTATCCTTATAGTA 

GGTGTCCAACAAATCAGAAAGATTTGCAAAAGGCTCTCCCACCTGATTTGCAAAAGGAAC 

TGGACTGAAGGAAGTCTCAGTCAAGCATGGCTTGGTTTCCTGATTGAAAAAATTTCGGAA 

AGCGGAAAGTTTTTCACTAACCAGTATCCTTTCCAATTCATTTGCCGTATCGCGTCCCAG 

ACCTTGAAAGAGGCTTTGAAGATTTTTTGCTGTTAGTTCTTGGGTTTGCAGGATTTCAAA 

GAGCTTTTCATCCTTGATAGTAAAAGGATTGAGAGATTCTGTACTTGGCGGAGCGATATA 

GGTCGATCCTGGAAGTAAGGTGCGGTAGCTATTTTGTGAAAAGCCGACGTGTTTGATAAC 

TTCGAGGATTTTATGACTGCTTTTATCCGACCAGTTAGAATATTACTGTGTTTCCCCATA 

ATTTCGATAATCAAGGTAGCCTGGATATGGTCTCCAATCTCGTTTTTATTGGAAACTGTA 

ATTTCCACAATACGGTCATTTTCCACTTGCTCAATCGACTCAATCAGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 244 402 R 53 aa 

>[SEQ ID NO: 113] 3859250-1 ORF translation from 244-402, direction R 
VGEPFANLSDLLDTYYKDKAERDRVKQQASELIRRVENELQKNRHKLKKQEK* 

Description : 
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STRFBP5A NCBI gi : 496253 - Streptococcus pyogenes. 
Fibrinogen/Fibronectin binding protein 

Assembly ID: 3859588 
Assembly Length: 513bp 

>[SEQ ID NO:27] 3859588 Strep Assembly — Assembly id#3859588 

ATCGAATTTTGTTCTTTCATAGAGAGCTACCTGAGTTCTATTCAAGCTCAGGTAGTACTT 

TCTTATAAACTAGACAAACTAACTGTCATTCTACCATCAGATTACAAGACATCATCGTCA 

CTCACCTTGGAATTCAATGTCGTACCCCAATGGGTAATTTTACGGTGGGGTTGAGCTAAA 

ATTGGTCTGTTTTCATAGATTGTTTGCCATCTATTCCATAGTAGGCCCGTCTTTTTCTCA 

ATCTTAACTCGCAGATTTCTCATATTTTCTTTGATTGGGAGGTTGAGGACAAAACCTGCA 

GTCTGGTTGCGACCGTTTCCTTCCCAAGAATGACTACGAACAACTTGGTTTCCATCTTTA 

TCTACTGGAACTTCTTCCCAAGTTATGGAGTAGCGGGCAATGTAAGCTCCACTGTGTTGA 

ATTATCAATGTTTTATCTTTCACAGGGAGTCTGACTGATTGGTTGAACTGGCTTAGAAAC 

TTGTGTCGCCGTTTCAGCATTCGTAGCTATAAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 102 443 R 114 aa 

>[SEQ ID NO: 114] 3859588-1 ORF translation from 102-443, direction R 
VKDKTLIIQHSGAYIARYSITWEEVPVDKDGNQWRSHSWEGNGRNQTAGFVLNLPIKEN 
MRNLRVK I EKKTGLLWNRWQT I YENRP I LAQ PHRK I THWGTTLNSKVS DDDVL * 

Description: 

PNEUMOLYSIN ( THIOL -ACTIVATED CYTOLYSIN) . - STREPTOCOCCUS PNEUMONIAE. 

Assembly ID: 3859774 
Assembly Length: 214bp 

>[SEQ ID NO:28] 3859774 Strep Assembly -- Assembly id#3859774 

ATCGAATTCTAACATGTGCTTCTCCTTCTATTGTTCCTATCTTTAAAATCTACTCCTTCA 

TGCTCCAAGAGCCAAGCTTTCTTTTCCACTCCTGCAGCATAACCTGTCAGACGCTTGCCT 

GCTCCCAACACACGATGACAAGGTACTAGGATAGACCAAGGATTGCGTCCCACTGCTCCA 

CCAATTGCTTGAGCAGAAGCCACTTGCAGGTCTT 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 9 131 R 41 aa 
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>[SEQ ID NO: 115] 3859774-1 ORF translation from 9-131, direction R 
VLGAGKRLTGYAAGVEKKAWLLEHEGVDFKDRNNRRRSTC* 

Description : 

GLUTAMATE RACEMASE (EC 5.1.1.3). - ESCHERICHIA COLI . 

Assembly ID: 3860140 
Assembly Length: 10 84bp 

>[SEQ ID NO:29] 3860140 Strep Assembly -- Assembly id#3860140 

CTCCAGCAATGGATCCAAGTATGATGGGCGGGATGATGTAAGCTTTCTATAGAAAACACC 

TTATAAAAAACACGAAAGGAGGGAATGACTAACCCTTCTTTTTATAATATTCACTTCTAA 

GATTGATGGTGAGCTCTCCTAACTTATATGATAAAATAAGACTAGAGGAAAGGAGAAGAA 

CATGATCGATGTACAAGAAATTCTGTGCAAGATGACCCCCAATCAGAAGATTAATTATGA 

CCGTGTCATGCAGAAAATGGTACAAGCATGGGAAAAAAATGAGTAGCGGCCAACCATTCT 

CGTGCATGTTTGCTGTGCCCCTTGTAGTACCTATACACTAGAATATTTGACCAAGTATGC 

AGATGTGACCATCTATTTTGCCAATTCTAATATCCATCCCAAGGCAGAATACCATAAGCG 

GGTCTATGTCACCAAGAAATTTGTTAGTGATTTTAATGAGCAGACAGGAAATACGGTTCA 

GTACCTAGAAGCTCCCTACGAACCCAATTAATACCGAAAACTAGTTAGGGGGCTAGAGGA 

GGAGCCCGAAGGTGGCGACCGTTGCAAGGTTTGTTTTGACTACCGACTGGATAAAACAGC 

GCAAGTGGCTATGGACTTGGGCTTTGACTACTTTGGTTCAGCCTTGACCATCAGTCCTCA 

TAAGAATTCTCAAACTATCAATAGCATCGGAATCGATGTGCAAAAAATTTACACGCCCCA 

CTATCTTCCCAACGATTTCAAGAAAAATCAAGGCTACAAACGTTCAGTAGAGATGCGTGA 

GGAGTATGATATCTATCGTCAATGTTATTGTGGCTGCGTCTATGCAGCCCAAGCCCAGAA 

TATTGACCTGGTTTAAGTTGAGTAGGACGCCACAGCATGCTTGCTGGATAAGGATGTTGA 

GAAAGACTATTCTCATATCACATTTATAGTAGATTGAAACTAGAATAGTACACCTTTACT 

TCTCAAACATTGTTAGAAATCGATTCGGCTGTCCTTATTTCATTTTAATATACTGGTACG 

AAATTAGATATATCAATGATAACTTGCCTCAAGGTAGGTTTTTTGATAGTAGAAAAGCGA 

TAGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 302 511 F 70 aa 

2 605 856 F 84 aa 

>[SEQ ID NO: 116] 3860140-1 ORF translation from 302-511, direction F 

VHVCCAPCSTYTLEYLTKYADVTIYFANSNIHPKAEYHKRVYVTKKFVSDFNEQTGNTVQ 

YLEAP YEPN * 

Description: 
unknown 
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>[SEQ ID NO: 117] 3860140-2 ORF translation from 605-856, direction F 

VAMDLGFDYFGSALTISPHKNSQTINSIGIDVQKIYTPHYLPNDFKKNQGYKRSVEMREE 

YDIYRQCYCGCVYAAQAQNIDLV* 

Description: 
unknown 

Assembly ID: 3860206 
Assembly Length: 112 4bp 

>[SEQ ID NO: 30] 3860206 Strep Assembly — Assembly id#3860206 

ATCGAATTCATTGACTGCCTGAAAAGACTTCAACTCGTCTGCCTGATAACCGAAAGACTT 

GGTTACTTTGATACCTGATACGGACTCCTGTACCTTGTTATTGAGTTCAGAAAAAGCAGC 

TTGGGATTCGCCAAAGGCCTTATGAGTCTTTCTCCCTAGGCGACTAGTCGTATAGGCCAT 

GAAAGGTAGGGGGAGAATGGCAACAAGAGTCATCTGCCATGAGATGCTAAAGAGCATGGT 

CAACAAAGTCACCAGAGCCGTGATAGAGGCATCCACCGCAGACATGACACCGCCACCTGC 

TAAACGAGTCAAGGAATTGATATCATTGGTTGCGTGTGCCATCAGATCACCCGTCCGATA 

GGTTTGATAAAAGGCTGACGACATTTTTGTGAAATGCTTAAACAAGCGAGACCGCATGAT 

CTGTCCCAAGCAATAAGAGGTCCCAAGGATATACATACGCCACACATAGCGCAAATAGTA 

CATACCAAAGGCTGCAAGTAGCAAGTAAAATAGGCTAAGAAGGAGGTCCTGCTGGGTTAA 

TTGCCCCGATGTGATGGCATCAATAACCCGCCCCATAACCATAGGAGGAATGAGATTGAG 

GACGGAAACCAAGACCAGGGCCACAATCCCGACTAGATAACGGCGTTTTTCTAACTTGAA 

AAACCACCAAAATTTTTGAATAATGGACATAAAATCCCTTTCTGGATTGCAAATAGAAAC 

CTGAGGCCAATACTCAATGGAAAATCAAAGAGCAAACTAGGAAACTAGCCGCAGGCTGCT 

CAAAGCACTGCTTTGAGGTTGTAGATAGAACTGACGAAGTCAGTAACCTACATACGGCAA 

GGCGACGTTGACGCCGTTTGAAGAAATTTCCGAAGAATACAAGACCCCAGGTTTTTCTTA 

TTTATAAGTTACCACTGTAACAGCACCCTTGTCATATTCAGCAATAAAGATATTGGCTAC 

ATTGTCATGCCCTTGTTTACTGAGGTTATCAAGCAACCACTCCTCGCTACGAACAATCGA 

TCCCAAGACATCTACTTGAATCACACCGTCAGTCACAACTGGATACTTAGGATTTTCATC 

TCCCATTTGCACAACGATGAGTTGCCCATTTTGCTCTTGCACAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 898 1056 R 53 aa 

>[SEQ ID NO: 118] 3860206-2 ORF translation from 898-1056, direction R 
VTDGVI QVDVLGS IVRSEEWLLDNLSKQGHDNVANI F I AEYDKGAVTWTYK * 

Description: 
unknown 

Assembly ID: 3860270 
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Assembly Length: 1242bp 

>[SEQ ID NO: 31] 3860270 Strep Assembly -- Assembly id#3860270 

TTACCTTCATTGCAGCCATTATTGGTTCTTGTGTCAGCCAGATTTTAAGTATTCTTTATA 

AGACACCTGCTGTGGTCTTTATCTTGGCCATTTTGGCACCGCTGGTTCCAGGTTATCTCT 

CCTACCGAACAACTGCCTTTTTTGTGACAGGGGACTATAATAAAGCACTGGCAAGTGCGA 

CCTTGGTTGTCATGTTGGCTTTGGTAATCTCTATTGGAATGGCTAGCGGT^ACAGTGATTC 

TCAGACTGTATCATTATATAAAAACACATCGAGTATCGTAGACTTTACAGAAATAAAAGA 

ATTTTCTGAAAAATGAGATAAATAAATTAACAACGGTTTCTATATGTGCGAGAATACCGC 

ACTTATGAAGAAATTGCGGCTGATTTTGGTATCCACGAAAGCAACTTAATCCGTCGGAGC 

CAATGGGTTGAAGTAACTCTTGTTCAAAGTGGTGTTACGATTTCAAAAACTCATCTTAGT 

GCTGAGAATACGGTGATTGTGGATGCAACAGAGGTAAAAATCAATCGCCCTAAAAAACAA 

TTAGCGAATGATTCTGGTAAAAAGAAATTTCACGCTATGAAGGCTCAGGCGATTGTCACA 

AGTCAAGGGAGAATTGTTTCTTTGGATATCGCTGTGAACTATTGTCATGATATGAAGTTG 

TTCAAAATGAGTCGCAGAAATATCGGACAAGCTGGAAAAATCTTGGCTGATAGTGGTTAT 

CAAGGGCCCATGAAGATATATCCTCAAGCACAAACTCCACGTAAATCCAGCAAACTCAAG 

CCGCTAATAGCTGAAGATAAAGCTTATAACCATGCGCTATCCAAGGAGAGAAGCAAGGTT 

GAGAACATCTTTGCCAAAGTAAAAACGTTTAAAATGTTTTCAACAACCTATCGAAATCAT 

CGTAAACGCTTCGGATTACGAATGAATTTGATTGCTGGCATTATCAATTATGAACTAGGA 

TTCTAGTTTTGCAGGAAGTCTATTATTTTCCTTATTGTCTGTAAGTCTACTGACCTTGTT 

GTTTATCCCAGTCATGGTTTCTAGTTCGGGCTCAGAGTTTCAAAGTGGATGGCAAGAGCA 

TCAATTGATTGCTGAGAAGGTTAGTAAAACACTTGACAAGACATTTGATAAGGATGTCAG 

AAAAATTCCGACCAGTCAGTTTTATCAAAAATTTGTAGATGAGATGGGAAGGATTTACTC 

AGGAAATTTGATCCTCCCAGGAGCTGATAACTGTGAATGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 346 966 F 207 aa 

>[SEQ ID NO: 119] 3860270-1 ORF translation from 346-966, direction F 

VREYRTYEEIAADFGIHESNLIRRSQWVEVTLVQSGVTISKTHLSAENTVIVDATEVKIN 

RPKKQLANDSGKKKFHAMKAQAIVTSQGRIVSLDIAVNYCHDMKLFKMSRRNIGQAGKIL 

ADSGYQGPMKIYPQAQTPRKSSKLKPLIAEDKAYNHALSKERSKVENIFAKVKTFKMFST 

T YRNHRKRFGLRMNL I AG I INYELGF * 

Description : 

ISL2 protein - Lactobacillus helveticus (Probable transposase) 

Assembly ID: 3860438 
Assembly Length: 157 5bp 

>[SEQ ID NO:32] 3860438 Strep Assembly — Assembly id#3860438 
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GTGATGGGGCCTCAGGGAAATGGTTTTGACTTGTCTGACCTTGATGAGCAGAATCAGGTT 
CTCCTTGTTGGTGGTGGGATTGGTGTTCCACCCTTGCTTGAGGTGGCCAAGGAATTGCAT 
GAACGTGGAGTGAAAGTAGTGACAGTCCTCGGTTTTGCTAATAAGGATGCTGTTATTTTG 
AAAACGGAATTGGCTCAGTATGGTCAGGTCTTTGTAACGACAGATGATGGTTCTTATGGC 
ATCAAGGGAAATGTTCCGTTGTTATCAATGATTTAGATAGTCAGTTTGATGCTGTTTACT 
CGTGTGGGGCTCCAGGAATGATGAAGTATATCAATCAAACCTTTGATGATCACCCAAGAG 
CCTATTTATCTCTGGAATCTCGTATGGCTTGTGGGATGGGAGCTTGCTATGCCTGTGTTC 
TAAAAGTACCAGAAAGCGAGACGGTCAGCCAACGCGTCTGTGAAGATGGTCCTGTTTTCC 
GCACAGGAACAGTTGTATTATAAGGAGAAAATTATGACTACAAATCGATTACAAGTGTCT 
CTACCTGGTTTGGATTTGAAAAATCCGATTATTCCAGCATCAGGCTGTTTTGGCTTTGGA 
CAAGAGTATGCCAAGTACTATGATTTAGACCTTTTAGGTTCTATTATGATCAAGGCGACA 
ACCCTTGAACCACGTTTTGGGAATCCAACTCCAAGAGTGGCAGAGACGCCTGCTGGTATG 
CTCAATGCAATTGGCTTGCAAAATCCTGGTTTAGAGGTTGTTTTGGCTGAAAAGCTACCT 
TGGCTGGAAAGAGAATATCCAAATCTTCCTATTATTGCCAATGTAGCTGGTTTTTCAAAA 
CAAGAGTATGCAGCTGTTTCTCATGGGATTTCCAAGGCAACTAATATAAAAGCTATCGAG 
CTCAATATTTCTTGTCCCAATGTTGACCACTGTAATCATGGACTTTTGATTGGTCAAGAT 
CCAGATTTGGCTTATGATGTGGTGAAAGCAGCTGTGGAAGCCTCAGAAGTGCCAGTTTAT 
GTCAAATTAACCCCGAGTGTGACCGATATCGTTACTGTCGCAAAAGCTGCAGAAGATGCG 
GGAGCAAGTGGCTTGACTATGATCATACTCTGGTGGGATGCGCTTTGACCTCAAAACCAG 
AAAACCAATCTTGGCCAATGGAACAGGTGGAATGTCAGGTCCAGCAGTTTTCCAGTAGCC 
CTCAAACTCATCCGCCAAGTAGCCCAAACAACAGACCTGCCTATCATTGGAATGGGGGGA 
GTGGATTCGGCTGAAGCTGCCCTAGAAATGTATCTGGCTGGGGCATCTGCTATCGGAGTT 
GGAACAGCTAACTTTACCAATCCTTATGCCTGCCCTGACATCATCGAAAATTTACCAAAA 
GTCATGGATAAATACGGTATTAGCAGTCTGG7VAGAACTCCGTCAGGAAGTAAAAGAGTCT 
CTGAGGTAAACTGCAATCAATCTGTTCTTGATTTTTTATTAGTTTGTAATATGAATTTAG 
GAGAATTTTGGTACAATAAAATAAATAAGAACAGAGGAAGAAGGTTAATGAAGAAAGTAA 
GATTTATTTTTTTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 1 276 F 92 aa 

2 460 1128 F 223 aa 

>[SEQ ID NO:120] 3860438-1 ORF translation from 1-276, direction F 
VMGPQGNGFDLSDLDEQNQVLLVGGGIGVPPLLEVAKELHERGVKWTVLGFANKDAVIL 
KTELAQ YGQVFVTTDDGS YG I KGNVPLL.SMI * 

Description : 
unknown 

>[SEQ ID NO:121] 3860438-3 ORF translation from 460-1128, direction F 
VKMVLFSAQEQLYYKEKIMTTNRLQVSLPGLDLKNPI I PASGCFGFGQEYAKYYDLDLLG 
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SIMIKATTLEPRFGNPTPRVAETPAGMLNAIGLQNPGLEWLAEKLPWLEREYPNLPIIA 
WAGFSKQEYAAVSHGISKATNIKAIELNISCPNVDHCNHGLLIGQDPDLAYDVVKAAV 
ASEVPVYVKLTPS VTDI VTVAKAAEDAGASGLTMI I LWWDAL * 

Description : 

DIHYDROOROTATE DEHYDROGENASE (EC 1.3.3.1) (DIHYDROOROTATE OXIDASE) 
(DHODEHASE) . - BACILLUS SUBTILIS. 

Assembly ID: 3860544 
Assembly Length: 77 6bp 

>[SEQ ID NO:33] 3860544 Strep Assembly — Assembly id#3860544 

CT AAG AT AT C AGAATAAC AAC G AAATC G AAGC ATT AAAAAC AAAT ATT AC TTC T AAG AAT 

AGCGAGATTGATAGTCAACAAAGCAATATTAAGGATATGACCGTACCTATAATGATCCAA 

CTTCTCAGGCTTATAATATTTATGCTCAATTAATTAGTGAGTTAGGTACTGCTCGTTCAA 

ACAACAATAAAAGTATTACAGAGCTTGAGGCTAATCTTGGAGTGGCAACAGGTCAAGATA 

AAGCTCATAGTATATTAGCGTCAAATGAAGGTACTCTGCATTATCTGGTACCTTTGAAAC 

AAGGAATGTCTATTCAGCAGGGGCAAACGATAGCAGAAGTTTCAGGGAAAGAAAAAGGTT 

ACTATGTAGAGGCTTTTGTACTTGCGAGTGATATTTCTCGTGTTTCAAAAGGAGCAAAAG 

TTGATGTTGCTATTACTGGTGTGAATAGTCAAAAATATGGAACACTAAAGGGACAAGTCA 

GACAGATTGATTCAGGAACAATTTCCCAAGAAACGAAAGAGGGGAATATTAGCCTCTATA 

AAGTCATGATAGAATTAGAAACCTTAACTCTAAAACATGGAAGCGAGACGGTCATACTCC 

AAAAGGATATGCCAGTTGAAGTGCGGATTGTCTATGATAAAGAAACCTATCTTGATTGGA 

TTTTAGAAATGTTAAGTTTCAAGCAATAATTGGTTTTAAACCTTAGGTAACCTATAAAAA 

CAAATAAGGTAGAGAAAGGATATTTTATCTAAGTTAGCTCACATTACTGCCATTCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 222 689 F 156 aa 

>[SEQ ID NO: 122] 3860544-1 ORF translation from 222-689, direction F 
VATGQDKAHS I LASNEGTLHYLVPLKQGMS I QQGQTI AEVSGKEKGYYVEAFVLASDI SR 
VSKGAKVDVAITGVNSQKYGTLKGQVRQIDSGTI SQETKEGNI SLYKVMIELETLTLKHG 
SETVILQKDMPVEVRIVYDKETYLDWILEMLSFKQ* 

Description: 
unknown 

Assembly ID: 3860558 
Assembly Length: 1487bp 

>[SEQ ID NO: 34] 3860558 Strep Assembly Assembly id#3860558 
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CTGGCCTTTCTCCACCAAAATTGTTCCTTGAGGGAAGGAAGTCAGAACACTAGCCGTTGC 
ATCTTCCTTTTGCTTTTCAATCGTAATTCCAGATAATTTTTCCCATTCTTTTTGGTGACC 
CCGGGAGGCAGGATTGAATGGCTTGAGGGAAATGACAAACTTGTCCTAGCAAGAATGGTC 
AAGGCACCTCCGTCTACAATCAAAATCTGATTTGGGCTTAAATTAACAAAGACCTGTTTT 
ACTAGATTTTCTCCAGAAGCATCGTCTCGTAAACCAGGCCCCAGC7VAGATAACTTCTGCC 
TTCTCCAATTGCTCTTTTAACAATTGCTGGTCTTGAAGAGAAAAGGCCATAGGCTCAGGT 
AAATGGCTGTGCAGAGCCGGGATATTTTCCCTGTCCGTTCCAACGGTCACCAATCCTGCA 
CCGCTTTTTACAGCTGCTAAAGCAGCCATGATGATGGCACCTCCATAAGGATAAGTACCA 
CCAAGCAGCAGCAGACGACCATAATCTCCTTTATGACTTGAACGAGAACGTTCAATAATA 
ACTTTTTCTAGTAAGGTTTGATTAATCACTTTCATCCTTTTTCCCTCTCACTTTTATTAT 
ACAACAAAAAGGAGACGCAGACCTCCTTTTGTAATCTTATATCTAAAATTTAATATTCAT 
TTCTGCCATTTTAGATATAGCTATAGAAAATACACTCTATTAATCGAATGTTTCTCTTAT 
TTTCTATCCAATGTCCGAAGTGCTGCTTGATAAGTTTGCTCCATCAGCATGGTAATGGTC 
ATAGGACCGACACCTCCAGGGACTGGCGTGATATGGCTAGCAAGTGGTGCAACTGCCTCA 
TAATCAACATCTCCACAGAGCTTCCCATTTTCATCTCGGTTCATCCCAACGTCAATGACA 
ACCGCACCTGGTTTGACAAAGTCAGCAGTCACAAACTTGGCGCGGCCGATTGCGACTACA 
AGAATATCTGCTTTAGCAGCCACCTTGGCAAGATTATGAGTTCGTGAGTGGGCCAAGGTT 
ACTGTCGCATTTTTAGCCAAAAGAAGCTGAGCCATAGGTTTTCCAACGATATTTGAACGA 
CCGATTACGACCGCATTTTTACCTTCCAAGTCAATCCCATATTCATGAAACATTTCCATA 
ATTCCTGCAGGTGTCGAGGGAATCATGACTGGATGTCCAGACCAAAGACGTCCCATGTTT 
AGGGGATGGAAACCATCCACATCCTTTTCTGGGTCAATGGCTAATAAAACCGCCTCTTCA 
TCGATATGTTTTGGTAATGGCAACTGGACCAAAATCCCATGCCAAGCTGGATCCTGATTA 
TATTTAGCAATCAGGTCTAACAATTCCTCTTGAGTAATGGTCTCTGGAACTCGCACTACT 
TCGGTACGGGAACCAGCCGCAAGAGCTGACCTCTCCTTGTTGCGAACGTTAAACTTGGCT 
GGCTGGATTATCCCCAACCAAAATCACTACCAAACCAGGCACTAGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 717 1376 R 220 aa 

>[SEQ ID NO:123] 3860558-2 ORF translation from 717-1376, direction R 

VRVPETITQEELLDLIAKYNQDPAWHGILVQLPLPKHIDEEAVLLAIDPEKDVDGFHPLN 

MGRLWSGHPVMIPSTPAGIMEMFHEYGIDLEGKNAWIGRSNIVGKPMAQLLLAKNATVT 

LAHSRTHNLAKVAAKADILWAIGRAKFVTADFVK^ 

AVAPLASHITPVPGGVGPMTITMLMEQTYQAALRTLDRK* 

Description: 

5 , 10 -methylene- tetrahydro folate dehydrogenase (folD) homolog - 
Haemophilus infl uenzae (strain Rd KW2 0) 

Assembly ID: 3860568 
Assembly Length: 1634bp 
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>[SEQ ID NO: 35] 3860568 Strep Assembly Assembly id#3860568 

CGTGCCTTGGCCAATGATCCAAAAATCTTGATTTCAGACGAGTCGCTTCAAATTTCGGCC 

CCTGGACCCTTAAGACCAACCCAAGCAGATTTTGGCCCTTGGTTGCAAGATTTGAACCAA 

AAATTAGGCTTGACTGTTGTCCTGATTACGCATGAAATGCAGATTGTCAAAGACATTGCC 

AACCGTGTTGCAGTTATGCAGGATGGGCATTTGATTGAAGAGAGTAGTGTGCTTGAAATC 

TTCTCAGACCCTAAACAACCTTTGACTCAAGACTTTATCTCAACAGCTACAGGTATTGAC 

GAAGCCATGGTCAAAATCGAGAAGCAAGAAATCGTGGAACACTTGTCTGAAAACAGTCTC 

TTGGTGCAACTCAAGTACGCTGGATCTTCAACAGACGAGCCACTTTTGAATGAATTGTAC 

AAGCATTATCAAGTAATGGCTAATATTCTCTATGGGAATATCGAAATCCTCGATGGTACT 

CCTGTTGGAGAATTGGTGGTGGTCTTGTCAGGTGAAAAAGCAGCGCTGGCAGGTGCTCAA 

GAAGCCATTCGTCAAGCAGGCGTACAGTTAAAAGTATTGAAGGGAGGACAGTAAGATGGA 

ATCATTGATTCAAACCTATTTACCAAATGTCTATAAGATGGGTTGGTCTGGTCAGGCAGG 

CTGGGGAACAGCTATCTACCTAACCCTCTATATGACAGTTCTTTCCTTCATTATCGGAGG 

CTTCTTGGGGCTAGTGGCAGGTCTCTTTCTCGTCTTGACAGCGCCAGGTGGTGTCTTGGA 

GAATAAAGTCGTATTCTGGATTTTAGACAAAATTACCTCAATTTTTCGTGCGGTTCCCTT 

TATCATCCTCTTGGCAATCTTGTCACCACTTTCTCACTTGATTGAAAAAACAAGTATCGG 

GCCAAATGCAAGCCCTTGTCCCACTTTCTTTTGCAGTCTTTGCCTTCTTTGCCCGTCAGG 

TGCAGGTTGTCTTGGCTGAAATGGATGGCGGTGTCATTGAGGCGGGCTCAAAGCGAGCGG 

AGCGACTTTCTGGGACATCGTGGGTGTTTACCTATCAGAAGGTCTTCCAGATTTGATCCG 

TGTGACGACTGTGACCTTGATTTCCCTTGTTGGGGAAACAGCTATGGCCGGTGCGGTTGG 

AGCTGGTGGTATCGGTAACGTAGCCATCGCTTATGGATTTAACCGCTACAATCACGATGT 

GACCATCTTGGCAACCATCGTTATCATTTTGATTATCTTTGCAATCCAATTCTTAGGAGA 

TTTCTTGACTAAGAAATTGAGCCATAAATAAAAAAGAGCCGTGTGGCTCTTTTTAACTGA 

TCAGATTTTCTGGGCAAATTTTTTACTCAAGGCTTGTCCAATCAAGGCACCCACTAGGGC 

TCCGATGACAATACTTGCGATAAATAGAAGGACAGTTCCAGGGTTTGGAGCGACCATGAT 

GCGGTCGATATATTCTTGGGATTTTCCTCTTGCCAGAAGAGTAGCCATATAGGCTTTGGG 

CGCAATCCACATAAGCAAGATTGGTCCTGTTGTACTAAAGGCGAAAATAATGAAAGAAAG 

GAAGTTCTTTGTTTTGTCCTTGTATTTTCCTAAATGAGCTACTCCATCTGCTAGGAGGCC 

ACAGATAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 1040 1291 F 84 aa 

>[SEQ ID NO: 124] 3860568-3 ORF translation from 1040-1291, direction F 
VGVYL S EGL PDL I RVTTVTL I S LVGETAMAGAVGAGG I GNVAI AYGFNRYNHDVT I LAT I 
VI I L 1 1 F AI QFLGDFLTKKL SHK * 

Description: 
unknown 
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Assembly ID: 3860582 
Assembly Length: 1087bp 

>[SEQ ID NO: 36] 3860582 Strep Assembly — Assembly id#3860582 

GGAATCATGATGATGTCACTGCTAAATGGTTTCTTAGAAAAAATATTTCCTGAGCGCTTA 

CAGATTAGTTTGGGCTTGCTGATTTTATCATTGAGCGGTACAGCTCCCTTCTGGTACCAA 

GCCTATCCCTTTGTCTTTGGAACACGGCTTCTCTTTGGTTTGGGTCTTGGGATGATCAAT 

GCCAAGGCCATTTCTATTATCAGTGAACGCTACCAAGGAAAAAGGCGAATTCAGATGTTA 

GGGCTACGCGCTTCTGCAGAGGTCGTTGGAGCTTCTCTCATTACCTTGGCCGTCGGTCAA 

GTTGTTGGCCTTTGGTTGGACAGCTATCTTTCTAGCCTATAGTGCTGGATTTTTGGTGCT 

GCCCCTTTATCTGCTCTTTGTCCCTTATGGAAAATCAAAGAAAGAAGTCAAGAAAAGAGC 

GAAGGAAGCAAGTCGTTTAACTCGAGAAATGAAAGGCTTGATTTTTACCTTAGCTATCGA 

AGCGGCAGTTGTAGTTTGTACCAATACAGCTATTACCATCCGTATTCCAAGTTTGATGGT 

GGAAAGAGGATTGGGGGATGCCCAGTTATCTAGTTTTGTTCTTAGTATCATGCAGTTGAT 

CGGGATTGTGGCTGGGGTGAGTTTTTCTTTCTTGATTTCTATCTTTAAAGAGAAACTGCT 

CCTCTGGTCTGGTATTACCTTTGGCTTGGGGCAAATCGTGATTGCCTTGTCTTCATCCTT 

GTGGGTGGTAGTAGCAGGAAGTGTTCTGGCTGGATTTGCCTATAGTGTAGTCTTGACGAC 

GGTCTTTCAACTTGTCTCTGAACGAATTCCAGCTAAACTCCTCAATCAAGCAACTTCATT 

TGCTGTATTAGGCTGTAGTTTCGGAGCCTTTACGACCCCATTCGTTCTAGGTGCAATTGG 

CTTACTAACTCACAATGGGATGTTGGTCTTTAGTATCTTAGGAGGTTGGTTGATTGTAAT 

CTCTATCTTTGTCATGTACCTACTTCAGAAGAGAGCTCTAGGATTGATTCCTAAGTTTTT 

CTTTTGATACTCAATGAAAATCAAAGAGCAAACTATAGTTGATTGAGTTTGGAATAGTAT 

GCTGTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 356 1027 F 224 aa 

>[SEQ ID NO: 125] 3860582-1 ORF translation from 356-1027, direction F 

VLPLYLLFVPYGKSKKEVKKRAKEASRLTREMKGLIFTLAIEAAVWCTNTAITIRIPSL 

MVERGLGDAQLSSFVLSIMQLIGIVAGVSFSFLISIFKEKLLLWSGITFGLGQIVIALSS 

S LWVVVAG S VLAGFAYS VVLTTVFQLVS ER I PAKLLNQ AT S FAVLGC S FGAFTTPF VLGA 

I GLLTHNGML VF S I LGGWL I VI S I F VMYLLQKRALGL I PKFFF * 

Description: 
unknown 

Assembly ID: 3860724 
Assembly Length: 119 Ibp 

>[SEQ ID NO: 37] 3860724 Strep Assembly -- Assembly id#3860724 
GGATTCCAACGATTATGAACTTGACTGGTCCACTGATTCATCCAATGGCTTTAGAAACAC 
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AGCTTTCTTGGAATTAGTCGTCCAGACTCCTAGAAAGTACAGCTCAGGTTTTGAAAATAT 
GGTCGCAAACGTGCCATCGTGGTTGCTGGACCAGAAGGGTTGGATGAAGCTGGCTTGAAC 
GGAACAACCNAGATTGCACTTNTTGAAAATGGCGAAATCAGCTTGTCAAGCTTTACTCCA 
GAGGATTTGGGAATGGAAGGCTATGCTATGGAAGATATTCGTGGTGGGAATGCTCAGGAA 
AATGCAGAAATTTTGCTTAGCGTTCTGAAAAACGAAGCAAGTCCATTCTTGGAAACGACA 
GTCTTGAATGCTGGTCTTGGTTTCTATGCTAATGGTAAGATTGATAGCATCAAGGAAGGA 
GTTGCCTTGGCCCGTCAAGTGATTGCTAGAGGCAAGGCCCTTGAAAAACTCAGACTGTTA 
CAGGAGTACCAAAAATGAGTCAGGAATTTTTAGCACGAATCTTAGAGCAGAAGGCGCGTG 
AGGTGGAGCAGATGAAGCTGGAGCAAATCCAGCCTCTGCGCCAGACCTATCGCTTGGCAG 
AATTTTTGAAGAATCATCAGGACCGCTTGCAGGTAATCGCTGAGTCAAGAAAGCTAGCCC 
TAGTTTGGGAGATATCAATCTCGATGTGGATATTGTGCAACAGGCCCAGACTTATGAAGA 
AAACGGAGCAGTGATGATTTCGGTGTTGACAGATGAGGTTTTCTTTAAAGGGCATTTGGA 
TTATCTACGGGAAATTTCCAGTCAGGTAGAGATTCCGACGCTCAACAAAGACTTTATCAT 
AGATGAAAAGCAAATCATCCGCGCTCGCAATGCAGGTGCGACAGTTATCTTGCTTATTGT 
GGCAGCCTTGTCCGAAGAACGCCTCAAGGAACTGTATGACTACGCGACAGAGCTTGGTCT 
GGAAGTCTTAGTGGAGACTCACAATCTAGCTGAACTAGAGGTAGCCCACAGACTTGGTGG 
CTGAGATTATCGGGGTCAACAACCGCAACTTGACTACCTTTGAAGTCGACTTGCAGACCA 
GTGTAGATTTAGCCCCTTACTTTGAGGAAGGTCGCTATTACATTTCTGAATCTGCCATTT 
TCACAGGGCAGGATGCGGAACGACTAGCCCCATACTTTAACGGAATTCGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 139 498 F 120 aa 

2 686 1024 F 113 aa 

>[SEQ ID NO: 126] 3860724-1 ORF translation from 139-498, direction F 

VVAGPEGLDEAGLNGTTXIALXENGEISLSSFTPEDLGMEGYAMEDIRGGNAQENAEILL 

SVLKNEASPFLETTVLNAGLGFYANGKIDSIKEGVALARQVIARGKALEKLRLLQEYQK* 

Description : 

ANTHRANI LATE PHOSPHOR I BO SYLTRANSFERASE (EC 2.4.2.18). - LACTOCOCCUS 
LACTIS (SUB SP. LACTIS) (STREPTOCOCCUS LACTIS) . 

>[SEQ ID NO: 127] 3860724-2 ORF translation from 686-1024, direction F 
VD I VQQ AQT YEENGAVMI S VLTDEVF FKGHLDYLRE I S S QVE I PTLNKDF 1 1 DEKQ 1 1 RA 
RNAGATVILLIVAALSEERLKELYDYATELGLEVLVETHNLAELEVAHRLGG* 

Description: 

INDOLE- 3 -GLYCEROL PHOSPHATE SYNTHASE (EC 4.1.1.48) (IGPS). - 
LACTOCOCCUS LACTIS (SUBSP. LACTIS) (STREPTOCOCCUS LACTIS). 

Assembly ID: 3860858 
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Assembly Length: 85 8bp 

>[SEQ ID NO: 38] 3860858 Strep Assembly -- Assembly id#3860858 

ATCGAATTTGCCAACCAAGAAAAATATCCCTTGGATGGTTCTTGGCAATGCAAGCAATAT 

CATCGTTCGTGATGGTGGGATTCGTGGATTTGTCATCTTGTGTGACAAGCTCAATAACGT 

TTCTGTTGATGGCTATACCATTGAAGCAGAAGCTGGGGCTAACTTGATTGAAACAACTCG 

CATTGCCCTCCGTCATAGTTTAACTGGCTTTGAGTTTGCTTGTGGTATTCCAGGAAGCGT 

TGGCGGTGCTGTCTTTATGAATGCGGGTGCCTATGGTGGCGAGATTGCTCACATCTTGCA 

GTCTTGTAAGGTCTTGACCAAGGATGGAGAAATCGAAACCCTGTCTGCTAAAGACTTGGC 

TTTTGGTTACCGCCATTCAGCTATTCAGGAGTCTGGTGCAGTTGTCTTGTCAGTTAAATT 

TGCCCTAGCTCCAGGAACCCATCAGGTTATCAAGCAGGAAATGGACCGCTTGACGCACCT 

ACGTGAACTCAAGCAACCTTTGGAATACCCATCTTGTGGCTCGGTCTTTAAGCGTCCAGT 

CGGGCATTTTGCAGGTCAGTTCGAATTTCAGAAGCTGGCTTGAAAGGCTATCGTATCGGT 

GGCGTAGAAGTGTCAGAAAAGCATGCAGGATTTATGATCAATGTCGCAGATGGAACGGCC 

AAAGACTACGAGGACTTGATCCAATCGGTTATCGAAAAAGTCAAGGAACACTCAGGTATT 

ACGCTTGAAAGAGAAGTCCGGATCTTGGGTGAAAGCCTATCGGTAGCGAAGATGTATGCA 

GGTGGTTTTACTCCCTGCAAGAGGTAGTGGGGACCTGACAGAGCCCCGATCGGTTAATCT 

ATGAAAAAGAAGGAATTT 

ORF Predictions: 

ORF # Start End Direction Length 



1 610 807 F 66 aa 

>[SEQ ID NO: 128] 3860858-1 ORF translation from 610-807, direction F 

VSEKHAGFMINVADGTAKDYEDLIQSVIEKVKEHSGITLEREVRILGESLSVAKMYAGGF 

TPCKR* 

Description: 
unknown 

Assembly ID: 3860890 
Assembly Length: 98 0bp 

>[SEQ ID NO: 39] 3860890 Strep Assembly -- Assembly id#3860890 

CTGAAAAAACAGGTTTTGACTATGNAGATTGACAGACGACCGTTCGGAGGTGCAGATATT 

GATGCAGCAGGACCTCCCTTACCTGATGAAACCCTTAAGGCAAGTAGGGAAGCAGATGCT 

ATCCTACTAGTAGCTATCGGTAGTCCTCAGTATGATGGAGTAGCGGTTCGCCCTGAACAA 

GGCCTGATGGCTCTCCGTAAGAACTCAATCTTTACGCTAATATTCGTCCTGTAAAAATCT 

TTGACAGTCTCAAGTATTTGTCACCACTCAAACCGGAACGAATTTCTGGTGTAGACTTCG 

TCGTGGTGCGTGAATTGACTAGGCGAGATTTACTTTGGAGATCATATCCTTGAAGAGCGC 

AAAGCGCGTGATATCAACGACTATAGCTATGAGGAAGTGGAGCGGATTATTCGCAAAGCC 

TTTGCCATCGAATTGCAAGAAATCGCAGAAAAATCGTTACTAGTATCGATAAGCAAAATG 
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TTCTAGCGACCTCAAAACTCTGGCGGAAAGTAGCTGAGGAAGTCGCACAGGATTTCTCAG 
ATGTAACCTTGGAACACCAGCTGGTAGACTCAGCTGCTATGCTTATGATTACCAATCCTG 
CTAAGTTTGATGTTATTGTAACGGAGAATCTTTTTGGAGATATTTTATCTGATGAATCAA 
GCGTCTTATCTGGTACACTTGGGGTTATGCCATCAGCCAGTCATTCTGAAAATGGACCAA 
GTCTCTATGAACCTATTCACGGTTCAGCACCTGATATTGCAGGTCAAGGAATTGCCAATC 
CTATTTCCATGATTTTATCAGTTGTCATGATGTTGAGAGATAGTTTCGGACGTTATGAGG 
ATACAGAGCGTATCAAACGTGCTGTTGAGACAAGTCTGGCGGCAGGAATTTTAACGAGAG 
ATATAGGAGGTCAGGCTTCAACAAAGGAAATGATGGAAGCTATTATTGCAAGGTTATGAA 
GTTAGACGAAAAAATTCGAT 

ORF Pr edi c t ions : 

ORF # Start End Direction Length 



1 397 486 F 30 aa 

>[SEQ ID NO:129] 3860890-2 ORF translation from 397-486, direction F 
VERIIRKAFAIELQEIAEKSLLVSISKMF* 

Description: 
unknown 

Assembly ID: 3860952 
Assembly Length: 874bp 

>[SEQ ID NO:40] 3860952 Strep Assembly — Assembly id#3860952 

TCGATCTAGAGAATTGCTCCAGAGCTTCCTGACCGTCCGCTGCCTCAATAGTTTCATAGC 

CACAATCCGTCAAATAATCACTGACCCCCTCACGGATCATCTCTTCATCTTCTACAATTA 

AAATTTTCATACTTTAACTGCTCTCTATTTTTTATTTTTCTTAGAATAAATACCTACTCT 

ATTTTCTATTATAGTCTCTTGCTGGCCTTTTGTATGTAAGCAACTGACCACTAGATAAAA 

CGTTGTGAAATTCCTTTCTCATAAATTCCATAACTTTAGTATATTATATTTAAGCACTAA 

AGTACAAAGAAAGCAACTGAAAGCAATGATTTTCACCACTGCTTTCAGATTTATTTTGT^A 

TTGTTAAATAGCTATTCCTATCCACTATTCTTGAATAGAAACACAAGATGCAATCTTTAT 

TCCAGACTCATTTTTTAAAAAATCAAATTTATTCACCATCCAGCAAGAGCTCTTTTGGTT 

GTTTTCTAAGGAGATTGCTTGAAGCAAGCGCCATAACGAGAACCACTAGAACCAAGGCAA 

GGACAAAAATGATGATAAAGTCTGATGTCTGAATGGAAATGTCTAGGCTCGACAAGGTCT 

TGCTAAAGCCATCTACTTCTGCACCGCCACCAAGGTTAGAGGCTTGAGCCGCCTTACTAG 

CCTGTTTGGCAACACCTGAAGTCACATTGGCAAGGACAGTGTTTCCAATTCGCACGGGCA 

GTGTAATTAGCTAGGAAGTAAGCANAAACTAGAGCAGGGATAGCAATCAAGATAGATTCG 

GTGATGAATTGACCCAAGATACTTGCCTGCTTGAGACCAATAGAGAGGAGGATTCCCACT 

TCCTTGCCGACGGGCATTGATCCAAAGACTGAGC 

ORF Predictions: 

ORF # Start End Direction Length 
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1 449 715 R 89 aa 

>[SEQ ID NO: 130] 3860952-1 ORF translation from 449-715, direction R 
VR I GNT VL ANVT S G VAKQ A S KAAQ A SNLGGG AE VDG FSKTLSSLDISIQTSDFIII F VL A 
LVLWLVMALASSNLLRKQPKELLLDGE* 

Description: 
unknown 

Assembly ID: 3860962 
Assembly Length: 7 62bp 

>[SEQ ID NO: 41] 3860962 Strep Assembly Assembly id#3860962 

CTTGTAACGGTCATAAAGTTTCTGCAAACTACCATCCTTGCTCCATTTAGTAACCAAGTT 

ATCAAGATAGTCGTTGAGCTCTGTATTTGATTTCTTGGTAACAATACCGTAGTCAGATGG 

CTTGAAACTATCATCTAGTAGTTCTGTGCGTTTAACTAGTGTAGCCAGATAGAATAGAGC 

GGTCAACGGAAAAGGCATCGATACGATGAGCGTGAAGGGAAGTAATCAATTCTGGGTAGG 

AACCAAGTTCGACGAATTTAAACTTCAGACCTTTCTTTTTACCCAGTTCAGTAATCAGGC 

GTTGGGTGATAGAACCTTGGGCGACTCCGATGGTTTTGCCGTTTAGGTCCTCAATCTTTT 

TGATTTTGGCAGATTTATTGACCAAAAATCCAGAAGCGTCTGTGTAGTAGGGACTGGTAA 

AGTTGTAGAGTTTTTTGCGTTCGTCCGTGATGGTAAAGGTCGCGATATCCATATCGACCT 

GTTCATTGTCTAGAAGGGGGCCGCGGGTTTGTGCTGTAACCGGCACATAGTGAATCTTGA 

CCTTGAGTTCATCAGCTACCATTTTGGCCAAGTCGGTTTCGATACCAGAATAAGTACCGG 

TCTTGGGATCTTTGTTAACCAAAATTGGGAACGTCTTGTTTGACACCCGACAACCAGTTC 

GCCTCTTTTTTGAATGTCTGCGATACTAGTATTAGCCTGGACTGGTTTGGCAGCAACAAG 

GCCGAAAAGGCTAATCAATAATGCTGATAAAAAGAATTCGAT 

ORF Predictions : 

ORF # Start End Direction Length 



1 152 646 R 165 aa 

>[SEQ ID NO:131] 3860962-1 ORF translation from 152-646, direction R 
VSNKTFPILVNKDPKTGTYSGIETDLAKMVADELKVKIHYVPVTAQTRGPLLDNEQVDMD 
IATFTITDERKKLYNFTSPYYTDASGFLVNKSAKIKKIEDLNGKTIGVAQGSITQRLITE 
LGKKKGLKFKFVELGSYPELITSLHAHRIDAFSVDRSILSGYTS* 

Description: 

cell adhesion factor PEB1 precursor - Campylobacter jejuni 

Assembly ID: 3861268 
Assembly Length: 1942bp 
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>[SEQ ID NO:42] 3861268 Strep Assembly — Assembly id#3861268 

CTCGAATTTTTGGTGCTCCAGAAACGGTTCCAGCAGGAAGCGTTGCTTTCAAGGCATCCA 

TGGCAGTGAGTTCTGCAAGCAAACGTCCCTTGACCACACTGGTCAAATGCATGACGTAGC 

GGAAGAGCTCCACCTCCATATACTTAGTAACTTGGACACTGGCCGTTTCAGAGATGCGGC 

CAATATCGTTACGCCCCAAGTCTACCAACATTCGATGTTCTGCTGTTTCCTTCTCATCAG 

AGAGGAGGTCAGTCGCCAAGGCCTTGTCTTCTCCATCCGTAGCCCCTCTTGGTCGCGTCC 

CTGCAATCGGATTGGTTGTCACGATGCCATTTTTGACAGAAACCAAACTTTCTGGACTAG 

CTCCGATGATTTGATAATCCCCAAAATCATACAAATAAAGGTAATTAGATGGATTAGTCA 

CGCGGAGATTTCTGTAGAAGTCAAATGGATTTCCAGTTAACTTCTGCGTGAAGAAAACGC 

TGGCTGAGTTACACATCGGAACATATCTCCGTTACGAATCAAGTCACGAGCTGTTTCTAC 

CATTCCCTCAAACTTATGTGGAGCGATATGCGGTTTGAAGTCAAGTGGTGATAAATCCAA 

GTCTTCAAATTCATTTGGAGCAGGAATGCGTAATTCCTCAAGCACTTGGTTCAAGGATTT 

TTCCAAGGCCTCTTGACTGCGCTCACTATAAAGTGCATCCTCTATGACATGTTATCTTCT 

CCTTCTTGTTGGTCAAAGACCATATAGCTCTCATAGACAAAGAAATGCATGTCGGGCGTC 

CCAATTGTATCCTCAGGGATTTGACCAATTTCTTCATAAAGCGAAATCATATCGTAACCA 

ACAAAACCAATGGCTCCCCCACCAAAAGGGAGGTCTGAATGGTGCTGGCTCTTATGAATC 

ACTTCATAAAGGAAATCCAAGGGATCCCGATCAATCGCTTGACCATTTTGATAGAGAACT 

CCATTTTCAAACTTAATCTCAAAAACTGGATTATAGGCTAGGATAGAAAAACGAGCTGTT 

TCCTTGTCTCTCGGAATACTCTCTAAAATAACCTTATGTTGCCCCTTTAAGCGCATATAA 

GCCAAGATTGGTGATAAGACATCTCCATGAATGATTCGTTCCATTGTCATTTCCCTTTCA 

GTTCTAATTCGAGTTCGTGGCGACTGTATGAAAAATCCCCACGCAAAATAACTTGCGTGA 

GGACGAAATTCGCGGTGCCACCTCAATTATAGGATTTCTCCTATCTCTCATTCCTGTCTC 

AGATATCTCCTGTAACAGGCTGTGCGATAAAGGGCACTCCCTTGAGAATGATGTTTTCTT 

CTCTCGTTTCAGATGAACCCAACTTTACAGCTTTCTCTGCTTGTTTTCAGCAACCACAAG 

CTCTCTGTGAGAGAAAAGACTGTAATTTTTCCATCTATTATTTTTTAGCTTCTAGTAATC 

TGCAATCGCAGCTAGGTCCTTGCCTCCACGACCAGAGACATTGATGAAGAGATGTTCATC 

TCGGTACACCTTTATACTCTTCGAAAATCTCTTCAAACCGCGTCAACGTCGCCTTGCCGT 

AGGTATGGTTACTGACTTCGTCAGTTCTATCTGCAACCTCAAAACAGTGTTTTGAGCTGA 

CTTCGTCAGTCTTATCGACAACCTCAAAACAGTGTTTTGAGCAGCCTGCAGCTAGTTTCC 

TAGTTTGCTCTTTGATTTTCATTGAGTATTATTTCATTTTCTCCTGCAATTGAATTCTTG 

CTCAGCTTTTTGTCTTCTATTTCTTTAAAATCAAAGTAGCTCTTTTGTTAATAACTCGAT 

CAACAAACATCGTGGTACAAGTATCTACTTTGAAATTTATCAACCACTTAACAACTGATA 

CTGTATTTCTAGGAAAACGATGACATTCTTCCTAATAAAACTTCTCATATATAGCATAAA 

TTTCTACTCTTTTTAATTCGAT 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 457 645 R 63 aa 

>[SEQ ID NO:132] 3861268-1 ORF translation from 457-645, direction R 
VLEELRIPAPNEFEDLDLSPLDFKPHIAPHKFEGMVETARDLIRNGDMFRCVTQPAFSSR 
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RS* 

Description : 

ANTHRANI LATE SYNTHASE COMPONENT I (EC 4.1.3.27). - LACTOCOCCUS LACTIS 
(SUBSP. L ACTIS) (STREPTOCOCCUS LACTIS) . 

Assembly ID: 3861270 
Assembly Length: 104 8bp 

>[SEQ ID NO:43] 3861270 Strep Assembly Assembly id#3861270 

CTGTTAAGATTGTTTCCGTGCATCCACATAGGATTTACCTTGTCTGTATGGGCCAATTCA 

CCCATCAAAACGCCATAGGTCTCATCTGTCAAGATACTAGACATACCGATATTGTACCAA 

AGACTGGTATGACGGAAATAAGTCGATGCGTGTAAACTCAACAAAAAGAGACGCAAGTTG 

ATTAGAAAAACCGTCATAGCAATAGCTGCCACAGGAGCTTGAACCACAATCAGTGCCAAC 

ATGGCAAACTGGGCACTCCCAGCATAAACAAAGAGACTCATCAAGCCCATCTCAACAGGT 

GTCACATAGGGCGCACCGATAGTCCCACAGGCCAGGCCGATACTGACATAGCCAAGAGCC 

GTTGGCATGGCTGCCTGCGCCCCCTCCTAAAATCCTTTTTCTTTCATCTTTCTCCTCATA 

TTGTCTTAATAATACTCAATGAAAATCAAAGAGCAAACTAGGAAATTAGCCGCAGGNTGC 

TCAAAACACCGTTTTGAGGTTGCAGATAGAAACTGACGAAGTCAGCTCAAAACACCGTTT 

TGAGGTTGCAGATAGAACTGACGAAGTCAGTAACATATATACGGCAAGGCGACGTTGACG 

TGGTTTGAAGAGATTTTCGAAGAGTATTAGAAAATGCCGATAAGGGTCTGCATACCAAGG 

CTGGTGAGGATGATGGCAATCCAGCAGACGGCTCCGAGAACAATGGATTTTCCACTGGAT 

TTGACCATAGCGACCAGATTAGTTTTGAGACCGATGGCACTCATGGCCATGATAATGAGG 

AATTTAGAGAGTTGTTTGAGAGGGGTAAAGAAACTACTAGACACACCGAGAGAGGTCAGA 

AGGGTGGTTAGGAGCGATGCAAGGATGAAGTAAAGGATAAAAAGTGGGAAGACTTTTTTC 

AGTTGTAAGCCTTGCTTATTTTTTTGCTCGCGACTTTGCCAGTAGGAGAGAAAGAGAGTG 

ATGGGGATGATAGCTAGGGTGCGCGTGAGTTTGACAATGGTTGCGGATTCGAGGGTATTG 

GTCTGGTAGAGACTGTCCCAAGCGCTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 627 824 R 66 aa 

>[SEQ ID NO: 133] 3861270-1 ORF translation from 627-824, direction R 

VSSSFFTPLKQLSKFLIIMAMSAIGLKTNLVAMVKSSGKSIVLGAVCWIAIILTSLGMQT 
LIGIF* 

Description: 
unknown 

Assembly ID: 3861288 
Assembly Length: 157 lbp 
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>[SEQ ID NO:44] 3861288 Strep Assembly -- Assembly id#3861288 

AGAGCTGGTAATATTCCCAAAGAAACGGCTCAAATCGAATTAGAAAGCCTTCTGCAAAAA 

GGAATCCCAGTCGCTCTGGTATCACGATGCTTTAACGGTATTGCCGAGCCTGTTTATGCC 

TACCAGGGTGGGGGCGTACAGTTGCAAAAAGCAGGCGTTTTCTTTGTTAAAGAACTCAAC 

GCCCAAAAAGCCCGCTTGAAACTCCTCATCGCCCTCAATGCCGGACTAACAGGACAGGCT 

TTGAAAGACTATATGGAAGGCTAATACTCTTCGAAAATCTCTGCAAACCACGTCAGCGTC 

GCCTTACCGTATGTAGAGCACAAAATCAGGAAATCTTCTCGATTCCCTGATTTTTTCTAT 

TTACGTTTTCGTGTTGAGCTACGTTCTGTCAAACCATGAGGTAAGAGAACTTCACGTTCT 

TCCAACTCTTCCTTATGCATAATCTTGGTCAACATACGCATACTAATGGCACCAAGGTCA 

TAAAGAGGTTGGGCAATCGTTGTCAAGTTTGGACGGGTAAAGCGTGAGATTTGTGAATCA 

TCACTAGTAATAATTCGATAATCTTCTGGCACAGAAACACCTTATCAGCCAAACCGTTCA 

AGACTCCTGCTGCCAACTCATCACCTGTCACAACTGCTGCAGTTGCATTTGATGAAATCA 

AACGCTCTGCTAAGGCGTAACCATCATCATAGCTATATTTAGATTCAAATACCAAACCCT 

CACTATAAGCGATTCCTGCTTTTTTCAAGGTTTCCTTGTAGCCAACTAAACGAACCTTAC 

CATTGATGTCATCCACTAGCGGACCGCTAACGAAAGCAATACGCTCATTTTCTTTAGCAA 

GGTAACTCACTGCATCAATTGTTGCTTGCTTATAGTCAATATTGACACTTGGCAACTGGT 

GCTCAACATCGACAGTTCCTGCGAGAACAATCGGAGTACGTGAACGCGAAAATTCTGAGC 

GAATTTTATCTGTCAAGTGATAACCCATATAGATAATGCCATCTACCTGCTTTGAAAAGA 

GGGTATTGACAACAGAAACTTCTTTCTCGTTATCTTCATCGCTATTAGCTAGGACAATAT 

TGTACTTGTACATTTCTGCAATATCATCAATCCCCTTAGCCAAACTCGAAAAATAACCAT 

TGGTAATATTTGGAATCACGACACCGACAGTGGTTGTCTTTTTACTTGCAAGACCACGCG 

CAACTGCATTTGGACGATAATCCAAACGATCAATTACCTCTAGCACTTTTTTACGGGTAT 

TCTCTTTTACATTTTTATTGCCATTGACCACACGGCTGACCGTCGCCATGGGAAACACCT 

GCTTCACGAGCGACATCATAAATGGTTACTGTATCATCTGCATTCATTCCTTTTCCTGTC 

CTTTCTATCTCCACACATTCTTTTACAAGTAGAAGTGCTGAATTGAAAGCTCTATATCTT 

ACTTACAAAAATGAAGATGTGAAAATTTCGTTTTCATATTTCTACTTATTCCATTCTATC 

ACTAATTGTAAACACTTTCAAGTGTTTTTTGAAGATTGATTGAAAAAATTTCATAGAAAA 

CCTAGGTTTAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 357 572 R 72 aa 

>[SEQ ID NO: 134] 3861288-1 ORF translation from 357-572, direction R 

VPEDYRIITSDDSQISRFTRPNLTTIAQPLYDLGAISMRMLTKIMHKEELEEREVLLPHG 

LTERSSTRKRK* 

Description: 

GLUCOSE-RESISTANCE AMYLASE REGULATOR. - BACILLUS SUBTILIS . 
Assembly ID: 3861306 
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Assembly Length: 1682bp 

>[SEQ ID NO:45] 3861306 Strep Assembly -- Assembly id#3861306 

CTGACGTAAAAAAGATTTTCGGAAAAGTATCATCATCTATTTTAGACCATTTTCTTATAA 

TAACCATTTTATTTTTATTTGTCAAGGTCTTTGAATTCTTTCTTAAACAAGCCTTGTAAT 

CTCTACTTTTGAAGAATTTATTTTTCCTTACTGACAAGATTTGAGACGGTAGGAATCATT 

GAAAATAACCTAGCCAACATCAATCACAATCATTTCTCCTTTCTCAATTACACTAAATTA 

TAGTGTATTGAATCTATAACAGTGCACCTTGGCTGCTAAAATATTTCTATAAATTAATTT 

GACTTTCCTGATAGAGTTGTTCACATCTTATTTCAATTCACTATACTTTCCCTTATACTC 

AATGAAAATCAAAGCGCAAACTAGGAAGCTAGCCACAGGCTGCTCAAAGCACTGCTTTGA 

GGTTGTAGATAAGACTGACGAAGTCAGTTACATATATCTACGGCAAGGCGAAGCTGACGC 

GGTTTGAAGAGATTTTCGAAGAGTATAAAGTTTGTTTCTGTATCTTTCAGAAAAATAAGG 

TATACTGTATGTAAACGATTTCAAAGGAGTCCAGTTATGGCAAAAACATTTTTTATTCCA 

AATAAACAGAGCATTTTAGGAGAACAAGAGATTTTGAATGCCAAGTCGATCTTGGCTATG 

ATGTAGTCTATCTCCGTCAGCCTCTTAATCGTCTCGAGTATATTGAGTGTGCGATAGTGG 

GGCAATCACAATTTCTTTTTAAGGTCAGTTATGCTGATGGTCAAAAGGCTTACCGTGTCG 

ATCTTCCTGACCTACTAACAAAGACAGACTGGCAGATTATCAAGTCATTTTTAGATGTTT 

TGCTTGCTTATACAGGGACTGATATTGAAGGGCTAGATGGTTTTGATTTTGAAGCTTATT 

TCCAAGCAAGTATTCAAGCCTATCTAGCAGACCCTGTAGCTCGTTTTACGATTTGCCAAC 

GAATTTTTAATCCTATTTTCTTTAGTCGTGAGAACTTGAAAAGCTTTTTAGAGGCAGATG 

GCTTGGCTCAGTTTGAAGCGCGTGTGCGTGCGGTTCAAGAGACAGATGCCTACTTTGCGA 

GAGTTTCCTTCTATCAGGATGGAGAAGGAAAAGTGCATGGCGTTTACCATCTAGCTCAAG 

GAGTCAAGACAGTTTTACCGAGAGAACCGTTTGTTCCTGCAGCCTATATTGAGCGAATTG 

GTGGATAAGGAAGTCCAGTGGGAGATTGACTTGGTTCAAATCACAGGAGACGGCTCTAAA 

CCAGAAGACTATGAATCCATAGCTCGCTTGGACTATGCAAAATTCTTAGAGGTATTACCC 

CCATCTTTTTACCACCAACTAGACGCCAATCAAATAGAAATACAACCCATCCTAGGACAA 

GATTTTAAAACATTAGCACAAGAAAAGTAAAGCAGAAGCAGGTCAATCGACTTGCTTTTT 

TGACATAGAAAAAATCCTGCCAAGGATGACAGGATTGCTACTCAATGAAAATCAAAGAGC 

AAACTAGGAAGCTAGCCGCAGGCTGTACTTGAGTACGGTAAGGCGAAGCTGACGTGGTTT 

GAATTTGATTTTCGAAGAGTATGAATTTTAAAGAAAGGCCAAGATACGAAGATAATCTCC 

AATCAGTGCCACTTCAGCTTCCAAGAAGAAGAAGATTATAACTCCCGTTCCCCAAGGACA 

GA 

ORF Predictions : 

ORF # Start End Direction Length 



1 717 1208 F 164 aa 

2 1201 1410 F 70 aa 

>[SEQ ID NO: 135] 3861306-1 ORF translation from 717-1208, direction F 
VGQSQFLFKVSYADGQKAYRVDLPDLLTKTDWQIIKSFLDVLLAYTGTDIEGLDGFDFEA 
YFQAS I QAYLADPVARFTI CQRI FNP I FFSRENLKS FLEADGLAQFEARVRAVQETDAYF 
ARVS F YQDGEGK VHG VYHL AQ G VKT VL PRE PF VP AAY I ER I GG * 
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Description: 
unknown 

>[SEQ ID NO: 136] 3861306-2 ORF translation from 1201-1410, direction F 
VDKE VQWE I DLVQ I TGDG SKPEDYE S I ARLDYAKFLEVLP P S F YHQLDANQ I E I Q P I LGQ 
DFKTLAQEK * 

Description: 
unknown 

Assembly ID: 3861334 
Assembly Length: 3 041bp 

>[SEQ ID NO: 46] 3861334 Strep Assembly -- Assembly id#3861334 
ATCGAATTAAAAATGAGGTATTCAGGCTTGTGATTTTCTATGGAAGTTAATAGTGATTGC 
CTCTAATGCTTACAAGTGATATTAAAAATAGAGGACCTAGTGATGTCAATCATTTCAACT 
GATTTAACCCCTTTTCAAATAGATGATACATTGAAAGCAGCCTTGCGAGAAGATGTTCAT 
TCCGAAGATTACAGTACCAATGCCATTTTTGATCATCATGGCCAAGCCAAGGTGTCGCTT 
TTTGCCAAGGAAGCTGGTGTTTTAGCGGGGCTAACCGTTTTTCAAAGGGTTTTTACCCTA 
TTTGATGCCGAGGTGACCTTCCAGAATCCTCATCAATTTAAGGATGGGGATCGTTTGACT 
AGTGGCGATTTGGTTTTAGAAATCATAGGCTCGGTGAGAAGTCTCTTAACATGTGAACGC 
GTTGCCTTGAATTTTTTACAACATTTATCAGGGATCGCTTCGATGACAGCTGCTTATGTA 
GAAGCCTTAGGCGATGATTGCATTAAGGTATTTGATACTCGAAAAACTACTCCTAATTTA 
CGTCTTTTTGAGAAATATGCCGTGAGAGTTGGCGGTGGCTATAATCATCGCTTTAATTTA 
TCAGATGCTATCCTGCTAAAAGACAATCACATTGCGGCAGTAGGTAGTGTTCAAAGGGCA 
ATTGCTCAAGCGCGTGCCTATGCTCCTTTTGTGAAAATGGTCGAGGTGGAAGTGGAAAGC 
CTTGCTGCTGCCGAAGAAGCTGCGGCGGCGGGTGCTGATATTATCATGTTGGATAATATG 
TCATTGGAACAGATTGAACAGGCCATTACCCTAATTGCAGGACGTTCTCGGATTGAATGT 
TCTGGAAATATTGATATGACCACTATTAGCCGTTTTCGTGGTTTAGCGATTGATTACGTC 
TCCAGTGGTAGTTTAACCCATAGTGCTAAGAGTCTTGATTTTTCCATGAAGGGTTTAACC 
TACCTTGATGTCTAAGTTGTAAAATAAACTAACTTTTTAAAGGATGTCTTTCCTCTAGAA 
CGAGTTTTATGTCAGATAGTTTAAACGCCTCTTCAAATATAGTAAAATGAACCAAAAATA 
GTACACAATGTGGTATAATCTTCTTATGGCATATTCAATAGATTTTCGTAAAAAAGTTCT 
TTCTTATTGTGAGCGAACAGGTAGTATAACAGAAGCATCACACGTTTTCCAAATCTCACG 
T AAT AC C ATTT ATGG C TGGTT AAAGC T AAAAG AG AAAAC AG G AGAG C T AAAC C AC C AAG T 
AAAAGGAACAAAACCAAGAAAAGTTGATAGAGATAGACTTAAAAACTATCTTACTGACAA 
TCCAGACGCTTATTTGACTGAAATAGCTTCTGAATTTGGCTGTCATCCAACTACCATCCA 
CTATGCGCTCAAAGCTATGGGCTACACTCGAAAAAAGGACCACACCTACTATGAACAAGA 
CCCAGAAAAAGTAGCCTTATTTCTTAAAAATTTTAATAGTTTAAAGCACCTAGCACCTGT 
TTAGATTGATGAAACAGGATTCGATACTTATTTTTATCGAGAATATGGTCGCTCATTAAA 
AGGTCAGTTAATAAGAGGTAAAGTATCTGGAAGAAGATATCAGAGGATTTCTTTGGTTGC 
AGGTCTAACAAATGGTGAGTTAATCGCTCCAATGACTTACGAAGAGACGATGACGAGCGA 

72 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCTYUS97/19226 



CTTTTTTGAAGCATGGTTTCAGAAGTTTCTCTTACCAACATTAACCACACCATCGGTTAT 
TATTATGGATAATGCAAGATTCCATAGAATGGGTAAGTTAGAACTTTTATGCGAGGAGTT 
TGGGCATAAACTTTTACCTCTTCCTCCCTACTCGCCTGAGTACAATCTTATTGAGAAAAC 
ATGGGCTCATATCAAAAAGCACCTCAAAAAGGTATTACCAAGTTGCAATACCTTTTATGA 
GGCTCTTTTGTCCTGCTCTTGTTTCAATTGACTATAGTTCACGGATACAGTTGGGAAAGA 
AGTTAAATGTAGTTGGATTTCCACTAAAGGTTGATGAGTAAGTTTTTGTATCTGAACCTG 
ATTGGCCGCAAGCAGCTAAAAGCAAAGCAGATGCAAAAGTCAGACCTGCACCAAGGACAC 
GCTTCTTTATGTTCATCTTCTTTCTCCTTAATAGTGGGAATTTGTAAAGTTAATTGAATT 
TCAAGAATGAAGGTTTTATAAACTTTGGTTATAAAAAACAAAGGATTTCTGTCTTTTATA 
CAGTCCTCCCCTTGTTTTTATACGATTTCAATTTTAAATTTTTCTGCAAAAAATATTTAT 
AGTAATTCCACACAGAAAGCATCCCATGGAACTAAGATTTGTTTTTCAAAGACTTCTTGA 
GCTAGGGTGTTTTCAATCAAGACAGATTTGACTTTTCCTTCTACTGTCAAGTCTTGCTCT 
TCATTGGACAAGTTAGCCACAACTAGGAAGCGACGGTCGCCATCCTTACGTATATAAGCA 
AAGACCTTATCAGCCGTATCAAGCAATTCAAAGTCAGCTCGAATTAGCCAACTATTCTCC 
TTGCGAATTTGGACCAGTTTCTGATAGGTATAGAAAATAGAATCTGGATTTGCCAGCGCT 
TCTTGGACGTTGATCATCTCGTAATTTGGATTAACTGCCAACCAAGGTTGACCTGTTGAG 
AAACCAGCGTTTTTGCTCTCGTCCCATTGCATAGGGGTACGGGCATTGTCACGTCCAATA 
ACACGGATACTGTCCATGATTTCTTGCATCGGAACACCTTTTTCAAGAGCCTCACGCGCA 
TAGTTGAGAGATTCAATATCTTCTACTTGATCCAGTGTTTCAAACGGATAGTTGGTCATC 
CCAATCTCCTCACCTTGGTAGATATAAGGAGTTCCTCTCATAAGATGAAGCAAGATTGCA 
AAGGCTTTGGCAGATTTTTCGCGGTATTCTTGGTCATTTCCCCAGATTGAGACAATACGA 
GGGAGGTCATGGTTGTTCCAGAAGAGGGAATTCCAGCCGTCCTCAACTCCTAACTCTGTC 
TGCCATTTGTTGAAGATTTCTTTTAACTTAGCGATATTCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 76 975 F 300 aa 

>[SEQ ID NO: 137] 3861334-1 ORF translation from 76-975, direction F 

VILKIEDLVMSIISTDLTPFQIDDTLKAALREDVHSEDYSTNAIFDHHGQAKVSLFAKEA 

GVLAGLTVFQRVFTLFDAEVTFQNPHQFKDGDRLTSGDLVLEIIGSVRSLLTCERVALNF 

LQHLSGIASMTAAYVEALGDDCIKVFDTRKTTPNLRLFEKYAVRVGGGYNHRFNLSDAIL 

LKDNHIAAVGSVQRAIAQAJ^YAPFVKMVEVEVESLAAAEEAAAAGADIIML 

EQAI TLi I AGRSRI EC SGNI DMTT I SRFRGLAI DYVS S GSLTHSAKS LDF SMKGLT YLDV* 

Description : 

PROBABLE NICOTINATE-NUCLEOTIDE PYROPHOSPHORYLASE ( CARBOXYLATING ) (EC 
2.4.2.19) (QUINOLINATE PHOSPHORIBOSYLTRANSFERASE (DECARBOXYLATING) ) 
(QAPRTASE ) (FRAGMENT). - BACILLUS SUBTILIS (BLAST) 

Assembly ID: 3864148 
Assembly Length: 4 6 94bp 
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>[SEQ ID NO:47] 3864148 Strep Assembly — Assembly id#3864148 

TTAATTTAAATTCTTAAAATTTTTTCATAATAATCTCCCTATAAAAATAAAGTCGCCCAA 

TCAGGCGGCTTATTTTTTTGAAAAATGGGCTTGGTGCCTGAGAATAAATAGCTTAGTGAT 

AGAAGAAAATGGGGAAATATGGTATAATGAAACGATAGATTTTTGAATAGGAATAAGATC 

ATGTTTGGATTTTTTAAGAAAGATAAAGGCTGTGGAAGTAGAGGTTCCGACACAGGTTCC 

TGCTCATATCGGCATCATCATGGATGGCAATGGCCGTTGGGCTAAAAAACGTATGCAAGC 

GCGAGTTTTTGGACATAAGGCGGGCATGGAAGCATTGCAAACCGTGACCAAGGCAGCCAA 

CAAACTGGGCGTCAAGGTTATTACGGTCTATGCTTTTTCTACGGAAAACTGGACCCGTCC 

AGATCAGGAAGTCAAGTTTATCATGAACTTGCCAGTAGAGTTTTATGATAATTATGTCCC 

GGAACTACATGCGAATAATGTTAAGATTCAAATGATTGGGGAGACAGACCGCCTGCCTAA 

GCAAACCTTCGAAGCTTTAACCAAGGCTGAGGAATTGACTAAGAACAACACAGGATTGAT 

TCTTAATTTTGCTCTTAACTATGGTGGACGTGCTGAGATTACACAGGCGCTTAAGTTGAT 

TTCCCAGGATGTTTTAGATGCCAAAATCAACCCAGGTGACATCACAGAGGAATTGATTGG 

TAACTATCTCTTTACCCAGCATTTGCCTAAGGACTTACGAGACCCAGACTTGATTATCCG 

TACTAGTGGAGAATTGCGTTTGAGCAATTTCCTTCCATGGCAGGGAGCCTATAGTGAGCT 

TTATTTTACGGACACCTTATGGCCTGATTTTGACGAAGCGGCCTTGCAGGAAGCTATTCT 

TGCCTATAATCGTCGCCATCGCCGATTTGGAGGAGTTTAGGAGGAAATATGACCCAGGAT 

TTACAGAAAAGAACCTTGTTATGCAGGGATTGCCCTGACTATTTTCCTACCAATTTTAAT 

GATTGGGGGCTCTTGCTTCAGATAGCAATCGGAATCATANCCATGCTAGCCATGCATGAA 

CTTTTGAAGATGAGAGGTCTAGAGACCATGACGATGGAGGCCTCTTGACCCTCTTTGCAC 

NTTNGTATTGACCATTCCCCTGGAATCGAATTACCTGACTTTTTTGCCAGTTGATGGGAA 

TGTGGTTGCCTATAGTGTTTTGATTTCAATCATGTTAGGAACGACCGTTTTTAGCAAGTC 

TTATACGATTGAGGATGCGGTTTTCCCTCTTGCTATGAGCTTCTACGTGGGCTTTGGATT 

TAATGCTTTACTAGATGCTCGTGTTGCAGGTTTGGACAAGGCTCTCTTAGCCTTGTGTAT 

CGTCTGGGCGACAGACAGTGGTGCCTATCTTGTTGGGATGAACTATGGGAAACGAAAGTT 

AGCACCAAGGGTATCGCCTAATAAAACCCTTGAGGGTGCCTTGGGTGGTATTTTAGGAGC 

AATTTTAGTAACCATTATCTTTATGATAGTTGACAGTACAGTTGCTCTTCCATATGGAAT 

TTACAAGATGTCAGTCTTTGCTATTTTCTTTAGCATTGCTGGACAATTTGGTGATTTACT 

AGAAAGTTCGATCAAACGTCATTTTGGTGTTAAGGATTCTGGGAAATTTATCCCTGGACA 

TGGTGGTGTTTTGGATCGTTTCGATAGTATGTTGCTTGTATTTCCAATCATGCACTTATT 

TGGACTCTTTTAATCAAAAGACGGAGGAAACGCTATGCTCGGAATTTTAACCTTTATTCT 

GGTTTTTGGGATTATTGTAGTGGTGCACGAGTTCGGGCACTTCTACTTTGCCAAGAAATC 

AGGGATTTTAGTACGTGAATTTGCCATCGGTATGGGACCTAAAATCTTTGCTCACATTGG 

CAAGGATGGAACGGCCTATACCATTCGAATCTTGCCTCTGGGTGGCTATGTCCGCATGGC 

CGGTTGGGGTGATGATACAACTGAAATCAAGACAGGAACGCCTGTTAGTTTGACACTTGC 

TGATGATGGTAAGGTTAAACGCATCAATCTCTCAGGTAAAAAATTGGATCAAACAGCCCT 

CCCTATGCAGGTGACCCAGTTTGATTTTGAAGACAAGCTCTTTATCAAAGGATTGGTTCT 

GGAAGAAGAAAAAACATTTGCAGTGGATCACGATGCAACGGTTGTGGAAGCAGATGGTAC 

TGAGGTTCGGATTGCACCTTTAGATGTTCAATATCAAAATGCGACTTTATCTGGGGCAAA 

CTGATTACCAATTTTGCAGGTCCTATGAACAATTTTATCTTAGGTGTTGTTGTTTTTTGG 

GTTTTAATCTTTATGCAGGGTGGTGTCAGAGATGTTGATACCAATCAGTTCCATATCATG 

CCCCAAGGTGCCTTGGCCAAGGTAGGAGTACCAGAAACGGCACAAATTACCAAGATTGGC 

74 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



TCACATGAGGTTAGCAACTGGGAAAGCTTGATCCAAGCTGTGGAAACAGAAACCAAAGAT 

AAGACGGCACCGACTTTGGATGTGACTATTTCTGAAAAGGGGAGTGACAAACAAGTCACT 

GTTACACCCGAAGATAGTCAAGGTCGTTACCTTCTAGGTGTTCAACCGGGGGTTAAGTCA 

GATTTTCTATCCATGTTTGTAGGTGGTTTTACAACTGCTGCTGACTCAGCTCTCCGAATT 

CTCTCAGCTCTGAAAAATCTGATTTTCCAACCGGATTTGAACAAGTTGGGTGGACCTGTT 

GCTATCTTTAAGGCAAGTAGTGATGCTGCTAAAAATGGAATTGAGAATATTCTTGTACTT 

CTTGGCAATGATTTCCATCAATATTGGGATTTTTAATCTTATTCCGATTCCAGCCTTGGA 

TGGTGGTAAGATTGTGCTCAATATCCTAGAAGCCATCCGCCGCAAACCATTGAAACAAGA 

AATTGAAACCTATGTCACCTTGGCCGGAGTGGTCATCATGGTTGTCTTGATGATTGCTGT 

GACTTGGAATGACATTATGCGACTCTTTTTTAGATAATCGAGGAATATTATGAAACAAAG 

TAAAATGCCTATCCCAACGCTTCGCGAAATGCCAAGCGATGCTCAAGTTATCAGCCATGC 

TCTTATGTTGCGTGCTGGTTATGTTCGCCAAGTTTCAGCAGGTGTTTATTCTTATCTACC 

ACTTGCCAACCGTGTGATTGAAAAAGCTAAAAACATCATGCGCCAAGAATTCGAAAAGAT 

TGGTGCTGTTGAGATGTTGGCTCCAGCCCTTCTTAGTGCAGAATTGTGGCGTGAATCAGG 

TCGTTACGAAACCTATGGTGAAGACCTTTACAAACTGAAAAACCGTGAAAAATCAGACTT 

TATCTTAGGTCCAACTCACGAAGAAACCTTTACAGCTATTGTCCGTGATTCTGTTAAATC 

TTACAAGCAATTGCCACTCAACCTTTATCAAATTCAGCCCAAGTATCGTGATGAAAAACG 

CCCACGTAATGGACTTCTTCGTACACGTGAGTTTATCATGAAGGATGCTTATAGTTTCCA 

CGCTAACTATGATAGTTTGGATAGTGTTTATGATGAGTACAAAGCAGCCTATGAGCGTAT 

TTTCACTCGTAGTGGTTTAGACTTCAAGGCTATTATTGGTGACGGTGGAGCCATGGGTGG 

TAAGGATAGCCAAGAATTTATGGCCATTACATCTGCTCGTACAGACCTTGACCGCTGGGT 

TGTCTTGGACAAGTCAGTTGCCTCATTTGACGAAATTCCTGCAGAAGTGCAAGAAGAAAT 

CAAGGCAGAATTGCTCAAATGGATAGTCTCTGGTGAAGATACCATTGCTTACTCAAGTGA 

GTCTAGCTATGCAGCTAACTTAGAAATGGCAACAAACGAGTACAAACCAAGCAACCGTGT 

TGTCGCTGAAGAAGAAGTTACTCGTGTTGAAACGCCAGATGTTAAATCAATTGATGAAGT 

TGCAGCCTTCCTCAATGTTCCAGAAGAACAAACGATTAAAACCCTCTTCTACATTGCAGA 

TGGTGAGCTTGTTGCAGCCCTTCTAGTTGGAAATGACCAACTCAACGAAGTCAAGTTGAA 

AAATCACTTGGGAGCAAATTTCTTTGACGTTGCTAGCGAAGAAGAAGTGGCGAATGTTGT 

TCAAGCAGGATTTGGTTCACTTGGACCAGTTGGTTTGCCAGAGAATATTAAAATTATTGC 

AGATCGTAAGGTGCAAGATGTTCGCAATGCAGTTGTCGGTGCTAACGAAGATGGCTACCA 

CTTGACTGGTGTGAACCCAGGCCGTGATTTTACTGCAGAATATGTGGATATCCGTGAAGT 

TCGTGAGGGTGAAATTTCCCCAGATGGACAAGGTGTCCTTAACTTTGCGCGTGGTATTGA 

GATCGGTCATATTTTCAAACTCGGAACTCGCTATTCAGCAAGCATGGGAGCAGATGTCTT 

GGATGAAAATGGTCGTGCTGTGCCAATCATCATGGGATGTTACGGTATCGGTGTCAGCCG 

TCTTCTTTCAGCAGTGATGGAGCAACACGCTCGCCTCTTTGTTAACAAAACGCCAAAAGG 

TGAATACCGTTACGCTTGGGGAATCAATTTCCCTAAAGAATTGGCACCATTTGATGTGCA 

TTTGATTACTGTTAATGTCAAGGATGAAGAAGCGCAAGCCTTGACAGAAAAACTTGAAGC 

AAGCTTGATGGGAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 

1 212 940 F 243 aa 
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2 1202 1753 F 184 aa 

3 2750 3037 F 96 aa 

>[SEQ ID NO:138] 3864148-1 ORF translation from 212-940, direction F 
VEVEVPTQVPAH I G I IMDGNGRWAKKRMQ PRVFGHKAGME ALQT VTKAANKLGVKVI TVY 
AFSTENWTRPDQEVKFIMNLPVEFYDNYVPELHANNVKIQMI 

ELTKNNTGLILNFALNYGGRAEITQALKLISQDVLDAKINPGDITEELIGNYLFTQHLPK 
DLRDPDLIIRTSGELRLSNFLPWQGAYSELYFTDTLWPDFDEAALQEAILAYNRRHRRFG 
GV* 

Description: 
unknown 

>[SEQ ID NO: 139] 3864148-2 ORF translation from 1202-1753, direction F 

WAYSVLISIMLGTTVFSKSYTIEDAVFPLAMSFYVGFGFNALLDARVAGLDKALLALCI 

VWATDS G AYLVGMNYGKRKLAPRVS PNKTLEG ALGG I LGAI LVT 1 1 FMI VD STVALPYG I 

YKMSVFAIFFSIAGQFGDLLESSIKRHFGVKDSGKFIPGHGGVLDRFDSMLLVFPIMHLF 

GLF* 

Description : 

CDP-diglyceride synthetase (cdsA) homolog - Haemophilus influenzae 
(strain Rd K W2 0) 

>[SEQ ID NO: 140] 3864148-10 ORF translation from 2750-3037, direction 

FVDLLLSLRQWMLLKMELRIFLYFLAMISINIGIFNLIPIPALDGGKIVLNILEAIRRKP 

LKQE I ETYVTLAGVVIMVVLMI AVTWNDIMRLFFR * 

Description: 
unknown 

Assembly ID: 3864172 
Assembly Length: 13 52bp 

>[SEQ ID NO: 48] 3864172 Strep Assembly Assembly id#3864172 

CTCGTAAGTTCGGAAGCTATCTACACAAGAAATTAACCGCTGCCTAAAGGAGAAGCCATG 

TCAACATATAACTGGGATGAGAAGCATATCCTTACCTTTCCTGAAGAAAAAGTAGCCCTT 

TCTACTAAGGATGTCCATGTTTACTATGGTAAAAATGAATCCATTAAGGGGATTGATATG 

CAATTTGAAAGAAATAAAATTACAGCTTTGATTGGTCCGTCGGGATCGGGGAAATCTACC 

TACTTACGCAGTCTCAATCGCATGAATGATACCATTGATATTGCTAAAGTAACTGGGCAG 

ATTCTCTATCGTGGAATTGATGTCAACCGTCCAGAAATCAACGTTTATGAAATGCGTAAA 

CACATTGGAATGGTTTTTCAACGCCCCAATCCATTTGCTAAATCGAATTTACCGTAATAT 

TACCTTTGCGCATGAACGTGCTGGAGTTAAGGATAAGCAAGTCCTAGATGAAATCGTAGA 

AACCTCCCTTAGTCAGGCTGCCCTTTGGGATCAGGTTAAAGACGATCTCCACAAGTCAGC 
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CTTGACCTTATCAGGTGGTCAGCAACAACGTCTCTGTATCGCTCGTGCCATCTCTGTTAA 
GCCAGATATCCTCTTAATGGATGAGCCAGCCTCAGCCTTGGATCCGATTGCGACCATGCA 
ACTAGAAGAGACCATGTTTGAGCTCAAGAAAAACTTTACCATCATCATTGTAACGCATAA 
TATGCAGCAGGCTGCTCGTGCAAGTGACTATACAGGCTTCTTTTACTTGGGTGATTTGAT 
TGAGTATGACAAGACTGCAACTATTTTCCAAAATGCCAAGCTACAGTCCACCAATGACTA 
TGTATCTGGTCACTTTGGTTAGAAAGGAAACCGTATGACAGATGCGATTTTACAGGTATC 
AGACCTGTCCGTTTATTATAATAAAAAGAAGGCTTTGAATAGTGTTTCCCTATCTTTCCA 
ACCTAAGGAAATTACAGCCTTGATTGGTCCATCTGGATCAGGGAAGTCAACCCTCCTCAA 
GTCTCTCAACCGCATGGGAGATCTCAATCCAGAGGTGACCACAACTGGATCCGTGGTGTA 
CAATGGTCACAACATCTACAGTCCGCGTACAGATACGGTTGAATTACGTAAGGAAATCGG 
AATGGTTTTCCAACAACCTAATCCTTTCCCTATGACTATCTATGAGAATGTTGTCTACGG 
GCTTCGTATCAATGGAATTAAGGATAAGCAGGTTCTGGATGAAGCCGTAGAAAAAGCCTT 
GCAAGGTGCCTCTATCTGGGATGAGGTCAAGGATCGTCTATATGATTCAGCTATTGGATT 
GTCAGGTGGTCAACAGCAGCGTGTCTGCGTGG 

ORF Predictions: 

ORF # Start End Direction Length 



1 311 862 F 184 aa 

>[SEQ ID NO: 141] 3864172-2 ORF translation from 311-862, direction F 

VELMSTVQKSTFMKCVNTLEWFFNAPIHLLNRIYRKITFAHERAGVKDKQVLDEIVETSL 

SQAALWDQVKDDLHKSALTLSGGQQQRLCIARAISVKPDILLMDEPASALDPIATMQLEE 

TMFELKKNFT 1 1 I VTHNMQQAARASDYTGFF YLGDL I E YDKTAT I FQNAKLQSTNDYVSG 

HFG* 

Description : 

HYPOTHETICAL ABC TRANSPORTER (ORF75) . - BACILLUS SUBTILIS. (BLAST) 

Assembly ID: 3864180 
Assembly Length: 22 58bp 

>[SEQ ID NO: 49] 3864180 Strep Assembly Assembly id#3864180 
AACTTCGACCGTGATAAACAAGCTGAGCTTTGACATACTTGTAGCCAACCTAAAAGCCGT 
TCTTCAAGGCCTCAAACCAGCTGCAACTCATTCAGGAAGCCTGGATGAAAATGAAGTGGC 
TGCCAATGTTGAAACCAGACCAGAACTCATCACAAGAACTGAAGAAATTCCATTTGAAGT 
TATCAAGAAAGAAAATCCTAATCCCAGCTGGTCAGGAAATATTATCACAGCAGGAGTCAA 
AGGTGAACGAACTCATTACATCTCTGTACTCACTGAAAATGGAAAAACAACAGAAACAGT 
CCTTGATAGCCAGGTAACCAAAGAAGTTATAAACCAAGTGGTTGAAGTTGGCGCTCCTGT 
AACTCACAAGGGTGATGAAAGTGGTCTTGCACCAACTACTGAGGTAAAACCTAGACTGGA 
TATCCAAGAAGAAGAAATTCCATTTACCACAGTGACTCGTGAAAATCCACTCTTACTCAA 
AGGAAAAACACAAGTCATTACTAAGGGTGTCAATGGACATCGTAGCAACTTCTACTCTGT 
GAGCACTTCTGCCGATGGTAAGGAAGTGAAAACACTTGTAAATAGTGTCGTAGCACAGGA 
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AGCCGTTACTCAAATAGTCGAAGTCGGAACTATGGTAACACATGTAGGCGATGAAAACGG 
ACAAGCCGCTATTGCTGAAGAAAAACCAAAACTAGAAATCCTAAGCCAACCAGCTCCTGC 
TGAGGAAAGCAAAGCTCTTCCTCAAGATCCAGCTCCTGTGGTAATAGAGAAAAAACTTCC 
TGAAACAGGAACTCACGATTCTGCAGGGACTAGTAGTCGCAGGACTCATGGCCACACTAG 
CAGCCTATGGACTCACTAAAAGAAAAGAAGACTAAGTCTTTTCGATAAAAAATAAACAGC 
GAGATTGAAGCTCGCTGTTTATTTTTTAATTAATCACCTAGTCCAAGACGTTCAAAGATA 
TCATCCACTCGTTTGGTGTAATAAACTGGGTTGAAGATTTCATCGATTTCTTCTTGTGTG 
AGACGTGATGTTACTTCTGAATCTGCCTCAAGAAGTGGTTTAAAGTCTACTTGGTTGTCC 
CAAGAGTAGGCTGTTTTTGGTTGCACCAAGTCATAGGCTTGCTCACGGGTCATGCCTTTT 
TCAATCAATGTCAACATAGCCCGTTGGCTAAAGATAAGACCAAAAGTCGAGTTCATGTTT 
CGGATCATATTTTCTGGGAAGACTGTCAAGTTCTTGACGATATTTCCAAAACGGTTGAGC 
ATGTAGTCAATCAAAATGGTCGTATCTGGTGTGATGATACGCTCAGCTGATGAGTGAGAA 
ATATCGCGTTCGTGCCAGAGAGCGACGTTTTCATAAGCCGTAATCATGTGACCACGAATG 
ACACGCGCCAGACCAGTCATATTTTCAGAACCGATTGGGTTGCGTTTGTGAGGCATTGCT 
GAAGACCCTTTTTGCCCTTTAGCAAAGAACTCTTCTACTTCGCGTTGCTCAGATTTTTGT 
AGACCACGAATCTCAGTCGCCATACGTTCGATTGAAGTCGCAATGCTGGCAAGAACCGCA 
AAGTACTCAGCGTGAAGGTCACGAGGAAGGACTTGTGTTAAAGATTCCTTGGGCACGGAT 
GCCAAGATTTATCGCAGACATACTCCTCTACAAATGGTGGGATATTGGCAAAGTTCCCAA 
CCGCACCAGAAATCTTACCAGCTTCTACACCAGCAGCCGCATGCTCGAAGCGCTCGATAT 
TGCGTTTCATTTCGCTGTACCAAGTTGCTAATTTAAGACCAAAGGTTGTCGGCTCAGCGT 
GCACACCATGAGTACGCCCCATCATGATGGTGAACTTGTGCTCCTTGGCCTTGTCAGCGA 
TGATATTAGTGAAGTTTTCAAGGTCACGACGGATGATGTCGTTGGCCTGCTTGTAGAGGT 
AACCATAAGCAGTATCCACCACGTCGGTAGAAGTTAACCCATAGTGAACCCACTTGCGCT 
CTTCACCAAGAGTCTCAGAAACCGCACGCGTGAAAGCCACCACATCGTGGCGCGTCTCCT 
GCTCAATTTCCAAAATACGGTCGATGTCAAAGTCCGCCTTCTTGCGAATCAAAGCCACAT 
CTTCCTTAGGGATTTCCCCCAACTCAGCCCATGCCTCGTCAGAGAGGATTTCCACCTCAA 
GCCAAGCACGGTATTTATTTTCTTCACTCCAAATATTCGCCATCTCAGGGCGAGAGTAAC 
GGTTGATCATGTGTTAATTTTTCCTTTCTTCTTAAGAT 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 930 1616 R 229 aa 

>[SEQ ID NO: 142] 3864180-2 ORF translation from 930-1616, direction R 
VPKESLTQVLPRDLHAEYFAVLASIATSIERMATEIRGLQKSEQREVEEFFAKGQKGSSA 
MPHKRNP I G S ENMTGLARVT RGHMI TAYENVAL WHERD I SHS S AERI I T PDTT I L I DYML 
NRFGNI VKNLTVFPENMIRNMNSTFGL I F SQRAMLTL I EKGMTREQAYDLVQ PKTAYSWD 
NQVDFKPLLEADSEVTSRLTQEEIDEIFNPVYYTKRVDDIFERLGLGD* 

Description: 

ADENYLOSUCCINATE LYASE (EC 4.3.2.2) ( ADENYLOSUCCINASE ) (ASL) . - 
BACILLUS SUBTIL IS. 
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Assembly ID: 3864184 
Assembly Length: 43 92bp 

>[SEQ ID NO: 50] 3864184 Strep Assembly -- Assembly id#3864184 

CCCTTTTGCCTCTCCCTTTGGTGCAGATTCTTTTGGGAATTGTGATTGGTCTCTTTTTAC 

CCAATACTGACTTTCATCTTAATACGGAGTTGTTTTTGGCCTGGTTATCGGACCCTTGCT 

TTTCCGAGAGGCTGAAGAAGCAGATGTTACGGCTATTTTAAAACACTGGCGAATCATTGT 

TTATCTCATATTTCCAGTGATTTTTATCTCGACCCTGAGTTTGGGTGGCTTGGCCCATCT 

TCTTTGGTTCAGCCTTCCCTTGGCAGCTTGCTTGGCTGTTGGGGCAGCCCTTGGTCCTAC 

GGACTTGGTGGCCTTTGCCTCTCTTTCGGAGCGTTTTAGCTTTCCTAAGCGCGTGTCCAA 

TATTCTTAAGGGCGAAGGACTCTTGAATGATGCTTCTGGTTTGGTGGCTTTTCAGGTAGC 

TTTGACAGCTTGGACAACTGGAGCTTTTTCTCTGGGGCAAGCTAGCAGTTCGCTCATCTT 

TTCAATCCTAGGCGGTTTTTTAATTGGATTTTTAACAGCCATGACCAACCGCTTCCTCCA 

TACCTTCTTGCTAAGTGTGCGCGCAACGGATATTGCCAGTGAACTTTTATTAGAATTCGA 

GTTTGCCTCTAGTGACCTTCTTTCTGGCAGAAGAAGTCCATGTTTCAGGGATTATTGCCG 

TCGTAGTTGATCGAATTTTAAAGGCAAGTCGCTTCAAGAAAATCACGCTCCTCGAAGCCC 

AAGTGGATACGGTGACCGAGACGGTCTGGCATACAGTGACCTTTATGCTCAACGGTTCTG 

TCTTTGTGATTTTAGGGATGGAGTTGGAAATGATAGCAGAACCTATCTTGACCAATCCAA 

TCTATAATCCTCTACTTTTATTGCTATCTCTCATCGCCCTTACCTTTGTCCTCTTTGTCA 

TTCGTTTTATTATGATCTATGGCTATTATGCCTATAGAACCCGACGCCTAAAGAAAAAGC 

TAAATAAGTATATGAAGGACATGTTTCTCTTGACCTTTTCAGGTGTTAAGGGAACGGTGT 

CGATTGCTACGATTCTCTTGATACCAAGTAATCTAGAACAGGAGTATCCTCTCTTGCTTT 

TCCTTGTTGCAGGTGTGACGCTTGTCAGCTTTTTAACAGGTCTCTTGGTCTTGCCTCATC 

TTTCTGATGAAGAGGAAGAAAGCAAGGATTATCTCATGCATATCGCCATTTTGAATGAAG 

TAACGCTAGAGTTGGAAAAAGAGTTGGAAGACACCAGAAATAAACTTCCCCTCTATGCGG 

CTATTGACAATTCGATCATGGACGTATTGAAAATCTCATTTTAAGCCAAGAAAACCAGGA 

TGATCAAGAAGACTGGGCTGCTTTGAAAATCGAATTCTTAGTATTGAAAGTGATGGTTTG 

GAACAGGCCTATGAAGAGGGGAACATTAGCAATCGTGCTTACCGAGTTTACCAACGTTAT 

CTGAAAAATATAGAACAAGGAATCAATCGTAAACTTGCCTCAAGACTGACCTATTATTTT 

CTTGTTTCCTTGAGGATTTTACGTTTTCTTCTTCATGAAGTTTTTACTCTTGGAAAGACC 

TTCCGTAGCTGGAAGGACAAGGAGCAAAGCCGTCTCCGTGCTCTTGATTATGACCAAATT 

GCAGAGCTCTATCTTGCCAATACAGAGATGATTATTGAAAGTTTGGAAAACCTGAAGGGA 

GTCTACAGACGCTCTTTGATTAGTTTTATGCAGGAGTCTCGTCTTCGAGAAACAGCTATT 

ATCAGCAGTGGTGCCTTTGTCGAACGGGTTATCAATCGTGTCAAACCCAACAATATCGAT 

GAAATGCTGAGAGGCTATTATCTGGAGCGCAAGTTGATTTTCGAATACGAAGAAAAACGA 

TTGATTACGACTAAGTATGCCAAGAAATTACGACAAAATGTAAATAACTTAGAGAACTAT 

TCCTTGAAGGAAGCTGCCAATACCCTGCCGTATGATATGGTGGAATTGGTAAGAAGAAAT 

TAGTTAATACTCTTCGAAAATCTCTTCAAACCACGTCAGCGTCGCCTTGGATTATATATG 

TGACTGACTTCGTCAGTTTCATCTACAACCTCAAAGCAGGGCTTTGAGCAACCTGCGGCT 

AGCTTCCTAGTTTGCTCTTTGATTTTCATTGAGTATAAGATTGTAAGTGAAGGAGTGTGA 

CATGAAAAAATGGGGAAAGAGCCTGAACTAGTCCTGTCTACTTTTACCCAATCACACTTC 

CATTTGGTACAGCTGGATCAACTGTGAGAAGGGATCGAATTTGCCATCATGTTCAGCTGA 
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GAGAATCATACCCTGGCTGACATATTTTTTCATCATTTTACGTGGTTTGAGGTTAGCAAC 
GATTTGAACTTTCTTGCCGACCAATTCTTGTTCATTTGGATAGTATTTTGCAATTCCTGA 
AAGAATCTGACGATCTTCTCCATCACCAGCATCCAAGCGGAATTGAAGCAACTTATCTGA 
ACCTTCTACTTTAGACACTTCTTTGACTTCTGCGACACGGATTTCAACCTTGTCAAAGTC 
TTCAAACTTGATTTCATCCTTGTTTAGTTTGAGCTCAACTTCGTCCGGATTCCATTCTTT 
TTCGACTGCTGGTTTATTGCCTTCCATTTGTTCCTTGATATAGGCGATTTCTTCTTCCAT 
ATTTAGACGTGGAAAGATAGGTGTTCCTTTGGCAACTACAGTCACATCTGCTGGGAAGTC 
AGCCAAACTCAAGTTTTCAAGACTAGAAACTTCTTCCAAACCAAGTTGAGTCAAAACTGC 
ACGACTAGTTTCCATCATAAATGGTTCAATCAAGTGAGCAACTACACGAATGCTGGCTGC 
CAAGTGGCTCATGACACTTGCCAATTGGTCACGAAGAGCTTCATCCTTGTCCAAGACCCA 
TGGTGCAGTCTCATCGATGTATTTATTGGTACGAGAGATCAGAGTCCAGACTGCTTCAAG 
CGCACGTGGATAGTCAACTGCTTCCATGTGTGTATGGAAGTCTGCGATTGATTTTTCTGC 
AACCTCAGCAAGAACATGATCAAATTCAGTCACACCTTCTACATAGGCAGGGATTTGTCC 
ATCAAAGTACTTATTAATCATGGAAACCGTACGGTTAAGGAGGTTCCCAAGGTCATTAGC 
CAATTCATAGTTGATACGACCGACATAGTCTTCAGGAGTAAAGGTTCCGTCTGAACCAAC 
TGGAAGGTTACGCATGAGGTAGTAACGAAGTGGATCTAGTCCATAACGCTCTACCAACAT 
TTCAGGGTAAACGACATTCCCTTTTGACTTAGACATTTTTCCGTCTTTCATGACAAACCA 
ACCATGGGCAATCAAACGATCAGGTAATTTAACATCCAACATCATAAGAAGGATTGGCCA 
GTAGATAGAGTGGAAGCGAAGGATGTCTTTTCCTACCATATGGAAGACTGTTCCATTCCA 
GAACTTGTCAAAGTTACCATGTTCGTCTTGAGCGTAGCCAAAAGCTGTCGCATAGTTAAG 
AAGGGCATCAATCCAAACGTAGACAACGTGTTTTGGATTTGATGGGACAGGCACTCCCCA 
TGTAAAGGTTGTACGAGATACCGCCAAATCTTCCAAACCTGGCTCGATGAAGTTGCGTAG 
CATTTCATTAAGACGACCATCTGGCGTGATAAATTCAGGATGAGCTTTGAAAAATTCGAC 
CAAACGGTCTTGGTATTTGCTAAGGCGAAGGAAGTATGATTCTTCAGAAACCCATTCAAC 
CTCATGACCTGATGGAGCAATACCACCAGTCACATTTCCAGCTTCATCACGGAAAACTTC 
TGCCAGCTGGCTTTCTGTAAAGAATTCTTCGTCTGATACTGAATACCAACCAGAGTATTC 
ACCCAAGTAGATATCATCTTGAGCAAGTAAGCGTTCAAAGACCTGTGCGACAACTTTTTC 
ATGGTAGTCATCGGTTGTACGGATAAATTTATCGTATGAGATATCTAGTAATTGCCAGAG 
TTCTTTAACTCCAACCGCCATTCCATCAACATAGGCTTGAGGTGTAATACCAGATTCGAA 
TTCCGCTTTCTGCTGGATTTTCTGACCATGTTCATCAAGACCTGTCAGATAAAATACATC 
GTAGCCCATCAGGCGTTTGTAACGTGCTAGGACATCACATGCGATAGTTGTGTAGGCAGA 
ACCGATATGAAGTTTCCCAGATGGATAGTAAATCGGCGTTGTAATATAAAAATTTTTTTC 
AGACATAATTTTTCCTTTCCAGGCAAATGAAACCTGTTTTTCTAACACTTCATTATATCA 
CATTTTTAATGAATTTCGATAGGGAAATCCATACCAAAACAAGATAGACGAGTGTCCATC 
TTGTTGATCTCATTCATAACGAAGGGCTTCAATTGGATCAAGTTTCGATGCCTTGTTGGC 
TGGCAAGACTCC 

ORF Predictions: 

ORF # Start End Direction Length 



1 197 670 F 158 aa 

2 612 1304 F 231 aa 
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>[SEQ ID NO: 143] 3864184-1 ORF translation from 197-670, direction F 
VI F I S TL S LGGL AHLLWF S L PLAAC L AVGAALGPTDL VAFAS L SERF S F PKRVSNI LKGE 
GLLNDASGLVAFQVALTAWTTGAFSLGQASSSLIFSILGGFLIGFLTAMTNRFLHTFLLS 
VRATDIASELLLEFEFASSDLLSGRRSPCFRDYCRRS* 

Description: 
unknown 

>[SEQ ID NO: 144] 3864184-2 ORF translation from 612-1304, direction F 
VTFFLAEEVHVSGIIAVVVDRILKASRFKKIT 

LGMELEMIAEPILTNPIYNPLLLLLSLIALTFVLFVIRFIMIYGYYAYRTRRLKKKLNKY 
MKDMFLLTFSGVKGTVSIATILLIPSNLEQEYPLLLFLVAGVTLVSFLTGLLVLPHLSDE 
EEESKDYLMHIAILNEVTLELEKELEDTRNKLPLYAAIDNSIMDVLKISF* 

Description: 
unknown 

Assembly ID: 3864194 
Assembly Length: 1941bp 

>[SEQ ID NO:51] 3864194 Strep Assembly -- Assembly id#3864194 
AATTAGTATTCTCAACCTTTTTATCTTGATAGTTCAAGATGGCATTCGTTGAATTGGTAA 
CATAGTAACTATCCACTCCCTTCAGTTTAGCTGCCTCTTGAACCCAGGATTCTTGCGGTT 
TTGGCGGTTCAACAGGAATTCTTTTTCTTTTCCAGAAACCGTAAAAGCTGATTGTTTCTG 
AGTAAAAGACCCATCTTTACTTTTTTTAGGAGAGAAAAAGACGCTAATATTTTTCTGAGA 
TTTAGTCATATCTTTATTGACTTGACGAGATAGGGAATCACCCAAAGCCATAATCACAAC 
AACTGATGAAACACCGATAATAATCCCAATCATAGTAAGCAAAGAACGCATCTTGTGAGC 
CATGATAGATGAAAAGGCAAATTTCAGATTCTGCATCTTAGTTTTCCTCCTTTCCTAACT 
GAGCACTGTCAGACGAAATGACCCCATCCCGAATGACAATCTGACGTTTGGCATAGGCAG 
CAATCTCAGGCTTCATGCGTTACCATGATAATGGTTTTTCCTTCTTTATTCAAATCAACC 
AATAATTGCATAATTTGGTTACCTGTTTTGGTATCCAAGGCTCCTGTCGGTTCATCCGCT 
AGGATAATAGAAGGATTGTTTACCAAGGCACGCGCAATGGCTACACGTTGCTTTTGACCA 
CCAGATAATTCTGAAGGTAAATGGTGACTACGTTCTATCAATTCAACCTTGTCTAAATAT 
TCCTCAGCCAACTTGCGACGTTTTGAAGACGAAACTCCTGCGTAAATCAAGGGCAATTCT 
ACATTTTGCAGAGCATTGAGCTTCGATAGAAGAAAGAACTGCTGAAAGACAAAACCGATT 
TGTTGGTTACGGACCTTAGCTAGTTGTTTTTCACCAAGCCCAGCCACTTCTTGACCTTCA 
AGATAATATTCTCCACTGGTTGGTGTATCCAACATGCCAATCGTATTCATCAGAGTGGAC 
TTACCAGACCCAGATGGTCCCATGATGGCTACAAATTCACCCTCATTCACTTCTAGATTG 
ATATTTTTGAGAACCTGCAGTTCTTGGTCACCATTACGGTAACTTCTGAAGATATTTTTT 
AGACTAATTAGTTGCTTCATCAGCCTTCACCTCTTTTCCTTCTTCCAAGGAAGATGTTGG 
ATTACTGATGACCTTAGCACCGTTCGTTAAACCAGAAGTGATTTCTTGATTTTCTGCGTC 
AGCATTTCCCAATGAAACCTCAACTTTTTTAGCCTTTTGTTGTTCATCCACAATCCAGAC 
ATAATTTTTACTATCATCCATTACTAGACTGCTAACAGGAACAAGAATAGCCTTAGTTTT 
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GCTTTTAACCTCAATGTTGACAGAAAAACCTTGTTTCAAATCACCAACCTCGCCTGTCAC 
ATCAATAGTATAAGGGTATTTAGAACCTGTATTATTCCCGGCTGCTGGACTAGCTGCTTC 
ACCATTGTTTTTAGGATAGTCAGAAATATAGGCTTAATTTCCCAGTCCATTTTTTATCAG 
GATACACTTTAGAAGTAAAGCTTACTTCTTGACCTACAGAAAGGTTGGCTAGATTGTACT 
CAGACAATTCTCCCTTGACTTGTAAATTTTCATTGCTGACAATATGAACCATAACTTGAC 
TCGCCCCTGTTGGAGATTTAGAAACATTGCTATTGACTTCGACTACAGTTCCCTCTAGGG 
TACTGAGAACAGTTGTTGCATCCAATTGACTTTGAGCCTTGCTTAATTGCGCTGCAGCAT 
CTGCACGCGCATCACGGGCATCACCCAATTGAGCATCAATAGAAGCAACAGAATTTCCAG 
CCACTGGAGTTGGGCTTTGCACCGTTGCATCTTCTCCTCCTACTGGCGCTGGTAACTGTG 
GAGCCTGAGCTGAAGCGGCTTCATTTCGTGCTTGATTGAGTTCATTGATATGACGATCTG 
CCTTAGCTACTGCTCGACTAG 

ORF Pr edi c t i oris : 

ORF # . Start End Direction Length 



1 1084 1380 R 99 aa 

>[SEQ ID NO: 145] 3864194-3 ORF translation from 1084-1380, direction R 
VTGEVGDLKQGF S VNI EVKSKTKAI LVPVS SLVMDD SKNYVWI VDEQQKAKKVEVSLGNA 
DAENQEI T SGLTNGAKVI SNPT S S LEEGKE WADE ATN * 

Description : 
unknown 

Assembly ID: 3864338 
Assembly Length: 1335bp 

> [SEQ ID NO:52] 3864338 Strep Assembly Assembly id#3864338 
ATCGAATTCCCTATTTTAACACTTTCTTTTCTAAAACAGTCTATATTTTATTTCAAACTG 
TATTATATTTTTGAAAAAATAAAGTCCTTTTTTCTTTTTTTCAGAAAAAAGGGTATAATA 
AAAGAAAATAAGCAGTAACACTCAATGGAAATCGAAAAAGCAAACTAGGAAGCTAGCCGC 
AGATTGCTCAAAACACTGTTTTGAGGTTGCAGATAGAGCTGACGTGGTTTGAAGAGATTT 
TCGAAGAGTATAAAAAGGTGCTAGGCATGTTGATTTTTCCTTTGTTAAATGATTTGTCAA 
GAAAAATCATCCATATTGGACATGGATGCCTTTTTTGCTGCAGTGGAAATCAGGGATAAT 
CCTAAACTCAGAGGAAAACCTGTCATTATTGGAAGCGACCCTCGGCAAACAGGTGGACGG 
GGAGTCGTTTCTACCTGTAGTTATGAGGCAAGAGCTTTTGGTGTCCATTCTGCCATGAGT 
TCCAAGGAAGCTTATGAACGTTGTCCCCAGGCTGTCTTTATCTCAGGGAATTCGATGAGA 
AATACAAGTCTGTGGGACTCCAGATTCGAGCTATTTTTAAGCGCTATACAGATTTGATTG 
AACCCATGAGCATTGACGAAGCCTATTTGGATGTGACAGAAAATAAACTCGGTATCAAGT 
CAGCGGTCAAAATTGCTCGCCTCATTCAAAAAGATATCTGGCAAGAACTCCATCTAACTG 
CTTCCGCAGGCGTTTCTTACAACAAATTCTTAGCTAAAATGGCGAGTGATTATCAAAAAC 
CACATGGTTTGACAGTGATTCTACCTGAACAGGCTGAGGATTTTCTCAAACAAATGGATA 
TTTCCAAATTTCATGGAGTAGGAAAAAAGACAGTAGAACGTCTTCATCAAATGGGCGTTT 
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TTACTGGTGCTGATTTACTTGAAGTTCCTGAGGTAACCCTAATAGACCGTTTTGGTAGAC 
TAGGCTATGATCTGTATCGAAAGGCTCGTGGCATTCACAACTCTCCAGTCAAATCCAATC 
ACATCCGTAAATCAATCGGCAAGGAGAAAACCTACGGGAAGATTCTCCGTGCTGAGGAAG 
ATATCAAAAAAGAGAGCTGACTCTTCTATCAGAAAAAGTCGCTCTCAATCTACATCAACA 
AGAAAAAGCTGGAAAAATTGTCATTTTGAAAATCCGCTACGAGGACTTTTCAACTCTTAC 
CAAACGAAAAAGTATTGCTCAAAAAACACAAGATGCTAGTCAGATAAGCCAAATAGCCCT 
GCAACTCTATGAAGAATTAAGTGAGAAAGAAAGAGGTGTCCGCCTATTGGGGATTACCAT 
GACTGGATTTTAAAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 552 1100 F 183 aa 

>[SEQ ID NO:146] 3864338-2 ORF translation from 552-1100, direction F 

VGLQ IRAI FKRYTDLI EPMS I DEAYLDVTENKLG I KS AVKI ARL I QKDI WQELHLTAS AG 

VSYNKFLAKMASDYQKPHGLTVILPEQAEDFLKQMDISKFHGVGKKTVERLHQMGVFTGA 

DLLEVPEVTLIDRFGRLGYDLYRKARGIHNSPVKSNHIRKSIGKEKTYGKILRAEEDIKK 

ES* 

Description: 

EGODINJ NCBI - Escherichia coli (substrain W3110, strain K-12) 
DinP, DNA damage inducible protein 

Assembly ID: 3864360 
Assembly Length: 17 96bp 

>[SEQ ID NO: 53] 3864360 Strep Assembly -- Assembly id#3864360 

TCCAAGCTAGCTATTTCGTGGAAGGGGCTTCGGTTGGCAGAACCTGGTGAATTTACCCAA 

ACGTGCTTTTTTAAACGGTCGCGTAGACTTGACACAGGCAGAGGCTGTGATGGATATCAT 

CCGTGCCAAGACTGACAAGGCCATGAACATTGCGGTCAAACAATTAGACGGCTCCCTTTC 

TGACCTCATTAACAATACCCGTCAAGAAATCCTCAATACACTTGCCCAAGTTGAGGTCAA 

TATCGACTATCCTGAATATGATGATGTTGAGGAAGCTACTACTGCCGTTGTCCGTGAGAA 

GACTATGGAGTTTGAGCAATTGCTAACCAAGCTCCTTAGGACAGCACGTCGTGGTAAAAT 

CCTTCGTGAAGGAATTTCAACGGCTATCATTGGACGTCCCAACGTTGGGAAATCAAGCCT 

TCTCAACAACCTCTTGCGTGAGGACAAGGCTATCGTAACCGATATCGCTGGGACAACACG 

AGATGTCATCGAAGAGTACGTCAACATCAATGGTGTTCCTCTAAAATTGATTGACACAGC 

TGGTATTCGTGAAACGGATGATATCGTTGAACAAATCGGTGTTGAGCGTTCGAAAAAAGC 

CCTCAAGGAAGCCGACTTGGTTCTACTAGTGCTAAATGCCAGTGAACCACTGACTGCGCA 

AGACAGACAACTTCTTGAAATTAGCCAAGATACCAATCGCATTATTCTACTTAATAAAAC 

CGACCTGCCAGAAACGATTGAAACTTCGAAACTACCTGAAGACGTTATCCGTATTTCAGT 

CCTTAAAAACCAAAACATCGACAAGATTGAAGAGCGAATCAACAACCTCTTCTTTGAAAA 

TGCTGGCTTGGTCGAGCAAGATGCTACTTACTTGTCAAACGCCCGTCACATTTCCCTGAT 
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TGAAAAAGCAGTTGAAAGCCTACAAGCCGTTAATCAAGGTCTTGAGCTGGGGATGCCAGT 
TGATTTGCTTCAAGTTGACTTGACTCGTACTTGGGAAATCCTCGGAGAAATCACTGGGGA 
TGCTGCTCCAGATGAACTCATCACCCAACTCTTTAGCCAATTCTGTTTAGGAAAATAAGA 
AAAATCCATGATCCTTCATTCGGTCATGGATTTTATTGTCTTTATTAGTAATCTGGTCTT 
AAGACCCCTGTTACAGTTGCCTTAGTTGCTTCGTAGTCGCCATCTACGACAACCTTGATA 
ATGCGTTTGACATCTTCTTCTGGTGCTGGAACAAGAGGTAGACGAGTGGGTCCAGCTTCA 
AATCCCATATAGTTAAGAATTGCCTTAACTGGAGCAGGACTTGGATAAGAGAAGAGAGCA 
TTAACCTTAGGAATGAATTTACGCTGAATTGCTGCGGCTTTCTTCATATCGCTTTCTGCA 
ATGG C AGT AAAC ATCTCGTGC ATTTC ATC C C C ATTTGT ATG AG AGG C AAC AG AAAT AAC C 
CCATCCGCCCCAAGGTTCATGGCATGGAAAGCATCTCCATCCTCACCTGTATAAATCAAG 
AACTCTTCAGGCTTGTGCTCAATCAAGTAAGCCATATTAGCCAAGCTAGTACATTCTTTG 
ACACCGATAATATTTGGATGGTCAGCCAAGCGAAGCATGGTTTCTGGAGTCAATTCGACA 
ACTACACGCCCTGGAATGTTATAGATAATAATTGGTAGGTCAGAAGCATCTGCAATAGCC 
TTAAAGTGCTGATACATCCCTTCTTGAGAAGGTTTGTTGTAGTAAGGAACAATAGCAAGC 
CCAGCTGCGAAACCACCAAATTCCGCTACTTCTTTGACAAACTCAATAGAGTCACG 

ORF Predictions: 

ORF # Start End Direction Length 



1 47 1078 F 344 aa 

>[SEQ ID NO: 147] 3864360-1 ORF translation from 47-1078, direction F 
VNLPKRAFLNGRVDLTQAEAVMDIIRAKTDKAMN^ 

QVEVNIDYPEYDDVEEATTAWREKTMEFEQLLTKLLRTARRGKILREGI STAI IGRPNV 
GKS SLLNNLLREDKAI VTD I AGTTRDVI EEYVNINGVPLKL I DTAG I RETDDI VEQ I GVE 
RSKKALKEADLVLLVLNASEPLTAQDRQLLEISQDTNRIILLNKTDLPETIETSKLPEDV 
IRISVLKNQNIDKIEERINNLFFENAGLVEQDATYLSNARHISLIEKAVESLQAVNQGLE 
LGMPVDLLQVDLTRTWEILGEITGDAAPDELITQLFSQFCLGK* 

Description: 

THIOPHENE AND FURAN OXIDATION PROTEIN THDF . - ESCHERICHIA COLI . 

Assembly ID: 3864388 
Assembly Length: 23 3 7bp 

>[SEQ ID NO: 54] 3864388 Strep Assembly — Assembly id#3864388 

CTTCGTACAGGTGGTTCCTATGCAAGGGTGGAAGCCAATCGTCAGAACAACAAGCATCTT 

CATCAAGCCAGAACTGGAGCAATTACAAAAAGAAATTGCTGAAGAAGAAGCAAGCTTGGG 

TTCAGAAGAAGTGGCTTTGAAGACCTTGCAAGATGAGATGGCCAGATTGACCGAGTCATT 

AGAAGCTATTAAATCTCAAGGAGAGCAGGCACGTATTCAGGAGCAAGGCTTGTCCCTCGC 

TTATCAGCAAACTAGTCAGCAAGTTGAAGAACTGGAAACTCTTTGGAAACTCCAAGAAGA 

GGAAATAGATCGTCTTTCCGAGGGAGATTGGCAAGCGGATAAGGAAAAATGCCAAGAGCG 

TCTTGCTGCAATCGCCAGTGACAAGCAAAATCTGGAAGCTGAGATTGAAGAGATTAAGTC 
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TAATAAAAATGCCATCCAAGAACGCTATCAAAACTTGCAGGAAGAGCTAGCGCAAGCTCG 
TTTGCTTAAGACAGAACTGCAAGGGCAAAAACGTTATGAAATTGCTGATATTGAACGCTT 
AGGCAAGGAATTGGACAATCTTGATTTTGAACAAGAGGAAATCCAGCGCCTTCTTCAAGA 
AAAGGTTGACAATCTTGAGAAGGTTGATACAGAATTGCTCAGTCAACAGGCGGAAGAATC 
CAAAACTCAGAAAACGAACCTCCAACAAGGTTTGATTCGCAAACAGTTTGAGTTGGATGA 
TATAGAAGGTCAGCTGGATGATATTGCTAGTCATTTGGATCAGGCTCGCCAGCAGAATGA 
GGAGTGGATTCGCAAGCAAACACGTGCTGAAGCTAAGAAAGAAAAGGTCAGCGAGCGCTT 
TGCCGCCATCTACAAAGTCAATTAACAGACCAGTACCAGATTAGCCATACTGAAGCTCTA 
GAAAAAGCGCATGAATTGGAAAACCTCAATCTGGCAGAGCAAGAAGTTAAGGATTTAGAG 
AAGGCTATTCGCTCACTGGGTCCTGTCAATATAGAAGCTATTGACCGGTACGAAGAAGTT 
CACAACCGTCTGGACTTTCTAAATAGTCAGCGAGATGATATTTTGTCAGCGAAAAATCTG 
CTCCTTGAAACCATTACAAAGATGAATGATGAGGTTAAGGAACGCTTTAAATCAACCTTT 
GAAGCTATTCGTGAGTCCTTTAAAGTGACCTTCAAGCAGATGTTTGGCGGAGGTCAGGCA 
GACTTGATATTGACTGAGGGCGACCTTTTACAGCTGGTGTGGAGATTTCTGTTCAACCTC 
CAGGTAAGAAAATCCAGTCGCTTAACCTCATGAGTGGTGGTGAAAAAGCCCTATCGGCTC 
TTGCCTTGCTTTTCTCCATTATTCGTGTCAAGACCATTCCTTTTGTCATCTTGGATGAGG 
TGGAAGCTGCGTTGGATGAAGCCAATGTTAAACGTTTTGGGGATTACCTCAACCGCTTTG 
ACAAGGACAGCCAGTTTATCGTCGTAACCCACCGTAAGGGAACCATGGCAGCGGCCGATT 
CCATCTATGGAGTGACCATGCAAGAATCGGGTGTTTCAAAGATTGTTTCAGTTAAGTTAA 
AAGATTTAGAAAGTATTGAAGGATGACAATTAAACTAGTAGCAACGGATATGGACGGAAC 
CTTCCTAGATGAGAATGGGCGCTTTGATATGGACCGCCTCAAGTCTCTCTTGGTTTCCTA 
CAAGGAAAAAGGGATTTACTTTGCGGTGGCTTCGGGTCGGGGATTTCTGTCTCTGGAAAT 
CGAATTATTTGCTGGTGTTCGTGATGACATTATTTTCATCGCGGAAAATGGCAGTTTGGT 
AGAGTATCAAGGTCAGGACTTGTATGAAGCGACTATGTCTCGTGACTTTTATCTGGCAAC 
TTTTGAAAAGCTGAAAACGTCACCTTATATAGATATCAATAAACTGCTCTTGACGGGTAA 
GAAGGGTTCATATGTTCTAGATACGGTTGATGAGACCTATTTGAAAGTGAGTCAGCATTA 
TAATGAAAATATCCAAAAAGTAGCGAGTTTGGAAGATATCACAGATGACATTTTCAAATT 
TACAACCAACTTCACAGAAGAAACGCTAGAAGCTGGTGAAGCTTGGGTCAATGATAATGT 
CCCTGGTGTCAAGGCTATGACAACTGGCTTTGAATCTATTGATATTGTTCTGGACTATGT 
CGATAAGGGTGTAGCTATTGTTGAATTAGCTAAAAAACTTGGCATCACAATGGATCAGGT 
CATGGCTTTTGGAGACAATCTTAATGACTTACATATGATGCAGGTTGTGGGACATCCTGT 
AGCTCCTGAAAATGCACGACCAGAGATTTTAGAATTAGCATAAGACTGTGATTGGTC 

ORF Predictions : 

ORF # Start End Direction Length 



1 1239 1586 F 116 aa 

>[SEQ ID NO:148] 3864388-3 ORF translation from 1239-1586, direction F 
VE I S VQ P PGKK I Q S LNLMSGGEKAL S AL ALLF S 1 1 RVKT I PFVI LDEVEAALDEANVKRF 
GDYLNRFDKDSQFIWTHRKGTMAAADSIYGVTMQESGVSKIVSVKLKDLESIEG* 

Description: 
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P115 protein - Mycoplasma hyorhinis (SGC3) (similarity to SMC1_YEAST / 
chromosome segragation protein) 

Assembly ID: 3864406 
Assembly Length: 2162bp 

>[SEQ ID NO: 55] 3864406 Strep Assembly Assembly id#3864406 

CTAAAAGTGAAGCCCGATAGCGTCTCTCTCCTGCAAGGATTTCATAACCAATAACAGGAG 

ATTGACGAACAATAATCGGTTGAATGACCCCATTTTCTTTGATAGACTGTGCTAGTTCAT 

CTAGCTTTTCTCTATCAAATTCTTTTCGGGGTTGATAGGGATTTTTTTGTATATCTGTGA 

TAGAAATCATTTCAAATTTTTCCATGATTCTACACTAACACATCTTTTCTCTTATGTAAA 

GCTTTCTTTACATAGATGTCAATTAAGATTCTAAATCACCTGAACTCTTGTTAAGTTTGA 

TAGAGGTAGTTTCTTCTTTCCCGTTACGATAGTAGGTTATCTTAATGGTGTCTCCGATAG 

AATGGTTGTAAAGAGCACTTTGTAAGTCTGTTGATGAAGCAATCTCTTTGTCATCTACTT 

TTGTAATTACATCGTATTTTTCAAGGTGACCATTGGCAGGCATATTACTTTGTACCGAAC 

GAACAATTACACCAGATGTAACATTACTTGGAATATTGAGTCTTCTGATGTCGCTTGTAC 

TCACATTAGATAAATTAACCATCTGGATTCCCAAAGCTGGACGCGTCACTTTTCCGTTTT 

TTTCTAACTGTTCAATAATATTGATAGCATCATTTGCAGGAATTGCGAAACCAAGACCTT 

CTACAGATGTTCCTCCATTTGTAGCAATTTTACTTGAGGTAATTCCGATAACCTGCCCTT 

GAATATTGATCAGTGGGCCGCCAGAGTTACCTGGGTTAATAGCAGTATCAGTTTGGATGG 

CTTTTGTAGAAATAGCTTGTCCATCTTCCGATTTTAAGGATACATTTCTATTGAGACTGG 

ATACGATACCTTGAGTGACAGTATTTGCATATTCAGAACCTAACGGGCTACCGATGGCAA 

TAGCAGTTTCTCCTACAGTTAACTTACTAGAATCACCAAACTCAGCTACTGTTGTCACTT 

TTTCTGAAGAGATTTCGACGACAGCAATATCAGAGAAAGTGTCAGCTCCGACAATTTCTC 

CAGGTACTTTAGTCCCATCTGACAATCGAATATCTACTTTGCTGGCGCCATTTATAACGT 

GATTGTTGGTGACGATGTAAGCTTCTTTATCATTCTTTTTATAAATAACTCCAGATCCTT 

CACTAGAGATTCGCTGAGAATCTGTGTCAGTATCATCATTGCCAAATACGCTATTTTGTC 

TGTTTGCCGAATAAGTAATAACAGAAACAACAGCATCTTTTACTTTGTTAACGGCCTGTG 

TTGTTGAATTTTCCGTTCCTTATAGGCAGTTTGTGTAATAGTACTATTGTTGTTAGAGTT 

GTTTACACTACTTTTTTGAGTTAGTTGAGTTATTGAAAAACTACCCAAGGCTCCACTAAA 

AAAGCTAATGACGATAACGACTAATAATTGAAACCATTTTTTGTAAAATGTTTTTAGATG 

TTTCATATTTGCCTCCATATGTTTGAATTACTGAAAGTATAAACTGACTAGCTTAATTAT 

AACTTAAACACAAAAGTTTTACACAAACTGTGGATAACTCTTTTGAAACTGTGATTTTCT 

TAATTGAAATCTATTTTTTATTTTGTGAATAAGATGTGAAAAAATAGAGAATATGTTAGA 

ATAGAGTCATGAAAATTAAAGTTGTAACAGTTGGGAAACTGAAAGAAAAGTATTTAAAAG 

ATGGTATCGCAGAGTATTCAAAACGAATTTCTAGATTTGCTAAGTTTGAAATGATTGAGT 

TATCAGATGAAAAAACACCAGATAAGGCCAGTGAATCAGAAAATCAAAAGATTTTAGAAA 

TAGAAGGTCAGAGAATTTTATCAAAAATTGCTGACCGTGATTTCGTTATTGTGTTAGCCA 

TTGAAGGGAAAACTTTCTTCTCAGAAGAATTTAGTAAGCAGTGAGAAGAAACTTCTATAA 

GGAAGGATGTCTACTCTTACTTTTATTATTGGGGGAAGTTTAGGATTGTCATCATCTGTA 

AAAAATAGAGCCAATCTTTCTGTCAGTTTTGGTCGCCTAACCTTGCCTCATCAGTTAATG 

AGACTAGTTCTTGTTGAACAAATCTATCGCGCTTTTACGATTCAGCAGGGATTCCCCTAC 

CATAAATAGAGAATTGACTTTTAATTGAATTTTTGGTAGAATAATTGTGTTAGGTCTCAT 
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AG 

ORF Predictions : 

ORF # Start End Direction Length 



1 263 958 R 232 aa 

>[SEQ ID NO: 149] 3864406-1 ORF translation from 263-958, direction R 

VTTVAEFGDSSKLTVGETAIAIGSPLGSEYANTVTQGIVSSLNRNVSLKSEDGQAISTKA 

I QTDTAINPGNSGGPL INI QGQVI G I TS SKI ATNGGT S VEGLGFAI PANDAINI I EQLEK 

NGKVTRPALGIQMVNLSNVSTSDIRRLNIPSNVTSGVIVRSVQSNMPANGHLEKYDVITK 

VDDKEIASSTDLQSALYNHSIGDTIKITYYRNGKEETTSIKLNKSSGDLES* 

Description: 

Bacillus subtilis (strain 168, ) DNA. Homologous to E. coli serine 
protease HtrA (BLAST) 

Assembly ID: 3864452 
Assembly Length: 17 6 6bp 

> [SEQ ID NO: 56] 3864452 Strep Assembly Assembly id#3864452 

ATCGAATTTTCCAAAATGGGGAGCTAGAGCAGTGGAGTGATTATGTGGCAGACGATTTGA 

TTCAGCATAATCATGAGATTGGACAAGGAAGTGCTGCTTATAAAAACTATGTGGCTGAAT 

ATATTGTCACTTTTGACTTCGTTTTCCAACTCTTAGGACAAGGAAACTATGTGGTTAGCT 

ATGGTCAGACTCAGATTGATGGCGTTGCTTATGCCAAGTACGATATCTTCCGTTTAAAGA 

ACGGGAAAATTGTGGAGCATTGGGATAATAAGGAAGTCATGCCTAAGGTAGAAGACTTGA 

CCAATCGAGGGAAGTTTTAAATTGAGGACAAAGAATGATTGAATACAAAAATGTAGCACT 

GCGCTACACAGAAAAGGATGTCTTGAGAGATGTCAACTTACAGATTGAGGATGGGGAATT 

TATGGTTTTAGTAGGGCCTTCTGGGTCAGGTAAGACGACCATGCTCAAGATGATTAACCG 

TCTTTTGGAACCAACTGATGGAAATATTTATATGGATGGGAAGCGCATCAAAGACTATGA 

TGAGCGTGAACTTCGTCTTTCTACTGGTTATGTTTTACAGGCTATTGCTCTTTTTCCAAA 

TCTAACAGTTGCGGAAAATATTGCTCTCATTCCTGAAATGAAGGGGTGGAGCAAGGAAGA 

AATTACGAAGAAAACAGAAGAGCTTTTGGCTAAGGTTGGTTTACCAGTAGCCGAGTATGG 

GCATCGCTTACCTAGTGAATTATCTGGTGGAGAACAGCAACGGGTCGGTATTGTCCGAGC 

TATGATTGGTCAGCCCAAGATTTTCCTCATGGATGAACCCTTTTCGGCCTTGGATGCTAT 

TTCGAGAAAACAGTTGCAGGTTCTGACAAAAGAATTGCATAAAGAGTTTGGGATGACAAC 

GATTTTTGTAACCCATGATACGGATGAAGCCTTGAAGTTGGCGGACCGTATTGCTGTCTT 

GCAGGATGGAGAAATTCGCCAGGTAGCGAATCCCGAGACAATTTTAAAAGTGCCTGCAAC 

AGACTTTGTAGCAGACTTGTTTGGAGGTAGTGTTCATGACTAATTTAATTGCAACTTTTC 

AGGATCGTTTTAGTGATTGGTTGACAGCTACAATGACATTGGTCGGTTCCTTGAGCAAGA 

GATAGATTAGCCAGACAGTCATGCCCAAAATCCCTCCAGGTAAGAGCATAGACCGTTGCA 

CATTAAGTACGATTAAAAAAGTGATAATGGCAAGAAAACTTGCTACTGCTTGTAATAAAA 

AGGTTGTTAGTGTCATATTAGTTCATCAATACCAAGGCGACAGAAGTTCCTGCCCCTAAA 
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GCGAGGGTAATGAGCAGGGATTCAAACATCTTACTCATACCAGAGTTTATGTGGTTGGTC 
ATAATATCACGGACCGCATTGGTCAAGGCAATACCTGGTACAAACGGCATGACCGCACCA 
GCTATAATCAAATCTGCCGTTGAAGGAAAACCTGTGTAGCGAGCCCAAAACTGGGCAATT 
ATCCCAAAGACAAAAGCTCCAGCAAAGGCTGTCACAAAGGGAATTCGGATAAATTTTTCC 
ACATAGAGGGAAAAGGCAAAACCAAATAAGGTCGCCACTCCTGCCCCAAGTGCGTCGTAG 
ATATTTCCGCTAAACATAACTGAAAAGAAAGGAGCACTAAAGGTCGCAGCCAGAGTTACC 
TGCAACTTAGTATAGGGAAGGGGTTGAGCTTGCAAGGCCGTCAATTGCTTAAAGGCTGTT 
TCTAAGTCAATCTGCCCCCCAACTGG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 1079 1201 R 41 aa 

>[SEQ ID NO: 150] 3864452-2 ORF translation from 1079-1201, direction R 
VQRSMLL PGG I LGMTVWL I YLLLKE PTNVI VAVNQ S LKR S * 

Description : 
unknown 

Assembly ID: 3864458 
Assembly Length: 1705bp 

>[SEQ ID NO: 57] 3864458 Strep Assembly -- Assembly id#3864458 

CTCTGACGGAGGCTGGTTATGTGGGTGAGGATGTGGAAAATATACTCCTCAAACTCTTGC 

AGGTTGCTGACTTTAACATCGAACGTGCAGAGCGTGGCATTATCTATGTGGATGAAATTG 

ACAAGATTGCCAAGAAGAGTGAGAATGTGTCTATCACACGTGATGTTTCTGGTGAAGGGG 

TGCAACAAGCCCTTCTCAAGATTATTGAGGGAACTGTTGCTAGCGTACCGCCTCAAGGTG 

G AC GC AAACATCC AC AACAAG AG ATGATTCAAGTGGAT AC AAAAAAT AT CCTCTTCATCG 

TGGGTGGTGCTTTTGATGGTATTGAAGAAATTGTCAAACAACGTCTGGGTGAAAAAGTCA 

TCGGATTTGGTCAAAACAATAAGGCGATTGACGAAAACAGCTCATACATGCAAGAAATCA 

TCGCTGAAGACATTCAAAAATTTGGTATTATCCCTGAGTTGATTGGACGCTTGCCTGTTT 

TTGCGGCTCTTGAGCAATTGACCGTTGATGACTTGGTTCGCATCTTGAAAGAGCCAAGAA 

ATGCCTTGGTGAAACAATACCAAACCTTGCTTTCTTATGATGATGTTGAGTTGGAATTTG 

ACGACGAAGCCCTTCAAGAGATTGCTAATAAAGCAATCGAACGGAAGACAGGGGCGCGTG 

GACTTCGCTCCATCATCGAAGAAACCATGCTAGATGTTATGTTTGAGGTGCCGAGTCAGG 

AAAATGTGAAATTGGTTCGCATCACTAAAGAAACTGTCGATGGAACGGATAAACCGATCC 

TAGAAACAGCCTAGAGGTGACTATGGAACTTAATACACACAATGCTGAAATCTTGCTCAG 

TGCAGCTAATAAGTCCCACTATCCGCAGGATGAACTGCCAGAGATTGCCCTAGCAGGGCG 

TTCAAATGTTGGTAAATCCAGCTTTATCAACACTATGTTGAACCGTAAGAATCTCGCTCG 

TACATCAGGAAAACCTGGTAAAACCCAGCTCCTGAACTTTTTTAACATTGATGACAAGAT 

GCGCTTTGTGGATGTGCCTGGTTATGGCTATGCTCGTGTTTCTAAAAAGGAACGTGAAAA 

GTGGGGGTGCATGATTGAGGAGTAATTTAACGACTCGGGAAAATCTCCGTGCGGTTGTCA 
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GTCTAGTTGACCTTCGTCATGACCCGTCAGCAGATGATGTGCAGATGTACGAATTTCTCA 
AGTATTATGAGATTCCAGTCATCATTGTGGCGACCAAGGCGGACAAGATTCCTCGTGGTA 
AATGGAACAAGCATGAATCAGCAATCAAAAAGAAATTAAACTTTGACCCAAGTGACGATT 
TCATCCTCTTTTCATCTGTCAGCAAGGCAGGGATGGATGAGGCTTGGGATGCAATCTTAG 
AAAAATTGTGAGGAAAAGAAAATGGCAAAAACAATTCATACAGATAAGGCCCCAAAGGCT 
ATCGGGCCCTATGTTCAAGGAAAAATCGTTGGCAACCTTTTGTTTGCTAGCGGTCAAGTT 
CCCCTATCCCCTGAAACTGGGGAAATTGTAGGAGAGAATATCCAAGAACAGACAGAGCAA 
GTCTTGAAAAACATCGGTGCTATTTTGGCAGAAGCAGGAACAGACTTTGACCATGTTGTC 
AAAACAACTTGTTTCTTGAGCGATATGAACGACTTTGTTCCTTTTAATGAGGTTTACCAA 
ACGGCCTTCAAAGAGGAATTCCCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 797 1105 F 103 aa 

2 1179 1391 F 71 aa 

>[SEQ ID NO:151] 3864458-2 ORF translation from 797-1105, direction F 
WMELNTHNAEILLSAANKSHYPQDELPEIALAGRSNVGKSSFINTMLNRKNLARTSGKP 
GKTQLLNFFNI DDKMRFVDVPG YGYARVSKKEREKWGCMI EE * 

Description : 
unknown 

>[SEQ ID NO:152] 3864458-3 ORF translation from 1179-1391, direction F 

VQMYEFLKYYEIPVIIVATKADKIPRGKWNKHESAIKKKLNFDPSDDFILFSSVSKAGMD 

EAWDAILEKL* 

Description: 

HYPOTHETICAL 22.0 KD PROTEIN IN LON-HEMA INTERGENIC REGION (ORFX). - 
BACILLUS S UBTILIS. 

Assembly ID: 3864474 
Assembly Length: 1673bp 

>[SEQ ID NO: 58] 3864474 Strep Assembly — Assembly id#3864474 
ACGTTTTGGGAACTGTTCGGATAGCAGATTCCGAACAAACTGATAATGGTTGGCAAAATC 
ATTATTCCTAATAGTAACGAAGCTGGTTAGGACAACTCATGCCATTTCCTAAAAAGGTTT 
TAATCCAAGGCACCAATAATTGTAGGCCGAAAAAACCATAAACAATAGATGGAATGGCTG 
CCATCAAGTTGATAGCTGATTTTAAGAAGCTATAGACGGGCTTTGGACAATTATAAACCA 
TAAACACCGATGTCAAGATCGCCTGTTGGCACCCCAATCACAATCGCTCCTAAGGTCGAA 
TAAATAAGGAACCAACGATCATTGGTAAAATACCATAGCTTGCCGGAATGTTCGTTGGCG 
ACCAATCACTGCCTAATAAAAAACGGGCAAAGCCGTAGTTAGCTATGAAAGGTAAGCCAT 
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TACTAAAAATAAAGAAACAGATTAGCAAAATAGCTACAACAGCTACTGTTGCACTCATGA 
AAAAAATTGCCCTAAAAACTGCTTCTTTGAAGGCTTGTTTTGTCACATCTTGTCCTTTCT 
AGTGAAGAAAGTAAGGGAGATACGACACCTCCCTACTTGCCTTCTTTATCTTATTGTACG 
ATGAAACGTCTGCATCTCTTTAGAGATTTATGGAGCAAACATTTTATTTAATCTTGTCCC 
AGGTGGTTAATTTGCCACTAAAAACGTCCGCAAGTTCAGCCATACTGACTTGGCTTGCCT 
TATTGTCATTATTGACCACAACAGCAATACCGTCTAAAGCAATAGCATCATGGGTGAGAC 
TCTTACCTTCTTCAGGAGTTAATTCCCTAGAAACCATACCAATATCAGCGGTTTTCTCCT 
TAACAGCGGTAATACCTGCTGAAGACCCATTAGAGGTAATATCAATCGTAACTTCTGGAT 
TTTCTTTTTTATAAGCTTCTGCTAATTTTTCCATTAAAGAAGATACTGAAGTGGAACCTA 
CAACAGACAACTTGCCTGATAAGTGTTGGCTTGTATATTCTGTGGTTTCGGTTTTAGCTT 
CAATAAATTTATTATCTGTGACCACTTGTTGACCTTGTTTGGAGTGGATAAAGCTGATAA 
AATCTTGACCTAGCTTGGAAAGATTAGAAGACCAAACAATGTTGAAGGGACGTTGAAGAG 
GGTATTCACCATCTAAAACTGTGTCTCGACTAGCCTTGACACCATCAATCTCTAAAGCCT 
TGACAGATTTCGTTAAAGATCCCAAGGAGATGTAGCCGATAGCATTAGCATTCCCTTGAA 
CTGCTGAGAGAACACCTTCTGTACTATTTTGAATCACAGCTGTTTTGGCAGTGTAGTCAA 
TTTTTTTATCACCGTCTTTTTTGAGAATCCCTGTGATTTCTGTGAAGGCACCCCGTGTTC 
CAGAGCCATTTTCTCGTGAAATCACCTCAATCGTTCCTGGAGCTGACTGTTTGGAAGCAG 
CTGACTGATTGCCACAGGCAACAAGCCCAAATCCTGATAAGCCAATGGCTGCAAGAGTAA 
GCATTTTTTTGAATTTCATAATAATCACCTTTATCTCTATGTATTTTTCTTGTGTAGGCT 
TACTACATTTATAGTCTAACAAGTCTTTGTAAAGGTTTATCCCTGATTCATGTAAAGATT 
GTGTAAAGAATCAAAAAAAGCCACTTTTGAAAAATGGCTGCCCCTAAAAATAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 68 247 R 60 aa 

2 644 1528 R 295 aa 

>[SEQ ID NO: 153] 3864474-1 ORF translation from 68-247, direction R 
VFMVYNCPKPVYSFLKSAINLMAAIPSIVYGFFGLQLLVPWIKTFLGNGMSCPNQLRYY^ 

Description: 

PROBABLE ABC TRANSPORTER PERMEASE PROTEIN (ORF72) . - BACILLUS 
SUBTILIS. (BLAST) 

>[SEQ ID NO:154] 3864474-2 ORF translation from 644-1528, direction R 

VI IMKFKKMLTLAAIGLSGFGLVACGNQSAASKQSAPGTIEVI SRENGSGTRGAFTEITG 

I LKKDGDKKI DYTAKTAVI QNS TEGVL S AVQGNANA I G Y I S LG S LTK S VKALE I DGVKAS 

RDTVLDGEYPLQRPFNIVWSSNLSKLGQDFISFIHSKQGQQWTDNKFIEAKTETTEYTS 

QHL SGKL S WG ST S VS S LMEKLAE AYKKENPEVT I D I T SNG S SAG I TAVKEKTADI GMVS 

RELTPEEGKSLTHDAIALDGIAVVVNNDNKASQVSMAELADVFSGKLTTWDKIK* 

Description : 
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probable hemolysin precursor - Streptococcus agalactiae (strain 74- 
360) 

Assembly ID: 3864510 
Assembly Length: 17 02bp 

>[SEQ ID NO: 59] 3864510 Strep Assembly — Assembly id#3864510 

CTTTTTTATTTCACAACAAGTTCATAACGTGTCTTACTGGTGAAGGTTTGACCAGCTTTA 

AGAATGACTTGGCCTTTAAGGTCACTGTGAATGGCATCTGGTAAAGCTTGCGCTTCAAGA 

GCAATCCCATTGTGCTGTAGCATTGGCTGACCTCCTATGATGACACTTTCATCCACAAAG 

TTTGCTGTGTAGACCACAAAGCAAGGAGCTTCTGTCTTGAAAAGCAGGAAGCGACCTGAA 

TTTTGGTCATAAAGGAATCCAGCATTGTCATGGCCTGCAGGAAGGGCAAATGGATGATCC 

AAACCTGATGCCAGCTGGATTTGCTCATCTTCTTCTGCAAAGATATCCTTCAACAAGGCA 

CCATTGTAGATGTGTTTGACCACATCACGGTTGGCTTCTGGAGTTTTGGCAGGAACACCG 

TCAGGAGCGATTGAGTAAATGCCCTCTGTGTTTAGTTGGAAGACATGACGGTCAATCGTC 

TGCGTGAAATCACCAGACAAGTTGAAATAGCTGTGGTTGGTTGGATTGACCAGCGTATCC 

TGATCGGTCGTTACCTTGTAGATCGAATTCATGGAGGCACCAGTTTCTTCCAAGTGATAA 

CTGATCGCCAAATCTTGAGATTTCCAGGGAACCCTCCTGTCCCATCTGTACGCTCTGTGT 

AGAGAGTCAAGCCATGATCGCTTACTTCTTCAACTTCAAACAAGCTGGAATCCCAACCAG 

TTGAACCACTGTGATTACAGTTGCTAGCATTATTAACCTCAAGGTCATAGGTCTTACCAT 

TGAGCTCAAAGGTCGCACCTGCAATACGACCCGCTACAGGACCTACACTTGCTCCATGCT 

TGGGACTATTGCCTACATAACTATCAAAGTCATCAAATCCCAAGATAACATTGGCAAAAT 

TTCCAGCCTTGTCAGGTGCGACATAGCGCAAGATAGTCGCACCATAAGTCATAACCTCAA 

GTTGGTAGCCACCGTCTGTCTCAAATCGATAGGCCAAGACATCCTCACCCTCAACATTTC 

CAAATACACGCTCTGTGTATGCTTTCATTCTGTTCTCCTTTTACTATTTCTCTCAAGCAA 

ACAAACCATAGAAAGCGTACTGACAATCTATGGTTTATCTGATAATTTACAAATCCTCTT 

GTCAAGAATTCATAAACACTGTCTTACTTTTGATATTCGTGAATTATGACACCTTGTACT 

ACACGGTTTACTGTACCTGTAGGAGACGGTGTATCTGGTTTATTTTCTACCTTGAGTGAA 

GTCAATAGGGCAAAGAGTTGGGCATAAACGATGTAAGGGAAGACACGGTAAATATCATTC 

AAGACACCGCCACAACCAAGGGCCACTTCTTTGACATTTTCAAGACCAAAAGCTTGATCA 

CTCAAAAGCACAACACGACGAGCAATCTGGTCACCAGCAACTTCACGAACCAAGTCCAAG 

TCGTACTTACGAGTGTAGTCCGTCGTTGTACCAAAGACCAAAACAACTGTATTGTCGTTG 

ATAAGAGATTTTGGACCGTGACGGAAGCCAACTGGGCTTTCATACATGGTCGCAACTTGA 

CCAGCAGTTAATTCCAAAATCTTGAGCTGAGCTTCATGAGCAAGTCCAAAGAAAGGACCA 

GCGCCTAGAATAGATGACACGGTTAAAGTCTAAATCAACGAGATCTTTGACATCTTCTGC 

CTTGTCTAAAACTTTACGGGCA 

ORF Predictions: 

ORF # Start End Direction Length 



1 1164 1640 R 159 aa 

>[SEQ ID NO: 155] 3864510-3 ORF translation from 1164-1640, direction R 
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VSS I LGAGPFFGLAHEAQLK I LELTAGQVATMYESPVGFRHGPKSLINDNTVVLVFGTTT 
DYTRKYDLDLVREVAGDQIARRWLLSDQAFGLENVKEVALGCGGVLNDIYRVFPYIVYA 
QLFALLTSLKVENKPDTPSPTGTVNRWQGVIIHEYQK* 

Description : 

AGAS PROTEIN. - ESCHERICHIA COLI . (Probable tagatose-6-phosphate 
ketose/ aldose isomerase) 

Assembly ID: 3864526 
Assembly Length: 1940bp 

>[SEQ ID NO: 60] 3864526 Strep Assembly Assembly id#3864526 

TGCAGGATTTGATTTGGACGACTTTTATTATTACCAGATTCGCCTAGGAATAGAAAAAAG 

AGCCCAAGAGTTGGACTATGATATCTTGCGCTATTTTAATGACCACCCTTTTACCCTAAG 

CGAGGAAGTGATTGGGATTCTCTGCATCGGAAAGTTTAGTCGAGCTCAGATTTCTGCCTT 

TGAAGAATACCAAAAGCCTCTTGTATTTCTAGACAGCGATACACTTTCCCTGGGACATAC 

CTGTATTATCACGGATTTTTACACTGCTATGAAACAGGTTGTCGATTATTTCCTCAGTCA 

AGGAATGGACCGTATCGGGATTCTAACAGGCCTTGAAGAAACAACAGACCAAGAAGAAAT 

CATTCAGGACAAGCGTCTAGAAAACTTCAAAAACTACAGTCAAGCGAGGGGAATCTATCA 

TGATGAACTGGTCTTTCAAGGAAGATTTACTGCCCAGTCTGGCTATGACTTAATGAAGGA 

GGCCATTCAGAGCTTGGGAGACCAACTTCCGCCAGCATTTTTCGCAGCCAGCGATAGTTT 

AGCTATCGGTGCCCTCCGTGCCCTCCAAGAAGCTGGAATCAGCCTGCCAGATCGCGTCAG 

CCTCATTTCCTTTAACGACACTAGTCTGACCAAACAGGTCTATCCTCCCCTCTCTAGTAT 

TACAGTTTATACTGAAGAAATGGGCCGAGCAGGTATGGATATTCTTAACAAGGAAGTCCT 

CCACGGTCGGAAAATCCCTAGCCTGACCATGCTGGGAACCAGACTGACATTAAGAGAAAG 

TACCCTAAATCAAGAATAGGATAACATAAAAAACGAATAGAGTTCTAAAACTCCTATTCG 

TTTTTTATTCGATTACAATCATAGACTTAATGGTCTTACGTTCATCCATATCTTTGTAGG 

CTTGGTCGATATCTTCCAGTTTATAACTTGAAGTAAAGACGCGACCTGGATTGATATCAC 

CATCAAGGACGGCTTTTAGTAAAAATTGCTTATCGTATGTTGTAGCAGAAGCTGCCCCAC 

CTGCTACAGAGATATTTTGCATAAATGTCGAACCAAGAGCACGATTATTATAGTGTGGGA 

CTCCTACAAAGCCCATACGCCCTCCATTATGAAGAACACCTAGCGCCTGTTCTATAGCAG 

CCTCCGTACCAACACATTCAAGTGCTGCGTCTGCTCCTCCGCCGAGGATTTCACGCACCT 

TGGTAATTCCTTCTTGACCACGTTCTGCAACAACAGCTGTCGCACCTGACTCCATAGCCA 

TCTTTTGACGGTCTTCATGACGGCTCATAAGGATAATTTGTGATGCTCCACGCATCTTAG 

CCGCGATGACAGCACATTGACCAACAGCCCCATCACCGATAACAACAACCTTGTCCCCTT 

TTTGAACATTTGCAACACGCGCCGCATGATAGCCTGTCGGCATGACATCTGCAAGAGTCA 

AAAGGGACTTGAGCATCCCTTCTGTATAGTCAGAAGGTTGACCAGGGATTTTAACCAGCG 

CCCAGTTTGCATAGTGGAAGCGAATATATTCTGCCTGAAAATCACCCCCCAAATTATTGC 

CAATATGATTGTCGCAAGAACCGTCAAATCCAGCAAGACAGGCATCACACTCACCACATC 

CATGTGTAAAAGGGACAATCACAAAATCACCTGGTTTCACCGTCGTAATGGCTTCCCCAG 

CTTCTTCAACAATCCCAATCGCTTCGTGTCCACTTATTTTTTGTGTCCAACTTTCGTTTT 

CCNTGGATTACGGTACCTCCATAAATTTGAACCACAAACGCACGCACGAACCACACGAAT 

AATCACATCATCCGCTTCTATTATTTGCGGACGTTCAATGCTAGCAAGTCCAACCTGACC 
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TGCCTTTGTATATACTGCTGATTTCATTTAAAATTTTCCTTCCTTATAAAGTTTAATTTT 
GAGATTTAAACGATTTAAAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 845 1660 R 272 aa 

>[SEQ ID NO: 156] 3864526-2 ORF translation from 845-1660, direction R 

WPGDFVIVPFTHGCGECDACLAGFDGSCDNHIGNNLGGDFQAEYIRFHYANWALVKIPG 

QPSDYTEGMLKSLLTLADVMPTGYHAARVAJWQKGD 

QI I LMSRHEDRQKMAMESGATAWAERGQEG I TKVRE I LGGGADAALECVGTEAAI EQ AL 
GVLHNGGRMGFVGVPHYNNRALGSTFMQNISVAGGAASATTYDKQFLLKAVLDGDINPGR 
VFT S SYKLED I DQ AYKDMDERKT I KSMI VI E * 

Description: 

ALCOHOL DEHYDROGENASE (EC 1.1.1.1). - ALC AL I GENE S EUTROPHUS . 

Assembly ID: 3864548 
Assembly Length: 2 051bp 

>[SEQ ID NO: 61] 3864548 Strep Assembly Assembly id#3864548 

ATCGAATTTTTCTAGCCAGGCTACAGTTTTGGCAAGTAAGGTTTCATCTCAGGCAGTCAA 

CTGGGTGAGTGCCTTTATTAGCGGAGCTTCTCAAGTGATTGTTGCCTTGATTATCGTTCC 

TTTCATGCTCTTTTATCTCTTGCGTGATGGGAAAGGCTTGCGTAACTATTTGACCCAATT 

CATTCCAAGAAAATTGAAGGAACCTGTTGGACAAGTTCTATCAGATGTGAATCAACAGTT 

GTCCAACTATGTTCGAGGGCAAGTGACAGTGGCTATTATTGTAGCAGTAATGTTTATCAT 

CTTCTTCAAGATTATTGGTCTACGCTATGCGGTTACGCTGGGGGTTACTGCTGGTATTTT 

AAATCTGGTCCCTTATCTTGGTAGCTTTCTAGCCATGCTTCCTGCCCTAGTATTGGGTTT 

GATTGCTGGTCCAGTCATGCTTTTGAAAGTAGTGATTGTCTTTATTGTAGAACAAACTAT 

TGAAGGCCGTTTTGTCTCTCCATTGATTTTGGGAAGTCAATTAAACATCCACCCTATTAA 

TGTTCTCTTTGTTTTGTTAACTTCAGGATCTATGTTTGGTATCTGGGGAGTTTTACTTGG 

TATTCCGGTTTATGCCTCTGCTAAGGTTGTCATTTCAGCCATTTTCGAATGGTATAAGGT 

AGTCAGTGGTCTATATGAATTAGAGGGTGAGGAAGTCAAGAGTGAACAATAGTCAACAGA 

TGTTACAGGCTTTGGAGGAGCAAGATTTAACTAAGGCTGAGCATTATTTCGCCAAAGCTT 

TAGAAAATGATTCAAGTGATCTTCTGTATGAGTTGGCAACTTATCTTGAAGGGATTGGTT 

TCTATCCTCAGGCCAAGGAAATTTACCTGAAAATTGTAGAAGAATTTCCAGAGGTTCATC 

TTAATCTAGCTGCAATGGCTAGCGAGGATGGTCAAATAGAAAAAGCCTTTAACTATCTTG 

AGGAAATCCAAGCTGACAGTGACTGGTATGTCTCGCTCTTTGGCTCTGAAGGCAGACCTA 

TACCAGCTGGAAGGTTTGACAGATGTGGCACGTGAGAAATTATTGGAGGCCTTGACCTAC 

TCAAAGGATTCTCTCTTGATATTGGGTTTGGCAAAGTTGGATAGTGAGTTGGAAAATTAC 

CAAGCGGCTATTCAAGCCTATGCCCAGTTAGATAATCGCTCGATTTATGAGCAAACGGGC 

ATTTCCACCTATCAACGAATTGGCTTTGCCTATGCTCAGTTAGGGAAATTTGAAACGGCT 
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ACTGAGTTTTTAGAAAAAGCCCTGGAGTTAGAATACGATGACTTAACAGCTTTTGAGTTG 
GCCAGTCTTTATTTTGATCAAGAAGAATATCAAAAAGCCACCCTCTACTTTAAGCAGCTT 
GATACCATTTCTCCTGACTTTGAAGGCTATGAGTATGGGTACAGTCAGGCTTTACATAAG 
GAACATCAAGTTCAAGAAGCCCTGCGTATCGCTAAGCAAGGATTAGAGAAAAATCCCTTT 
GAAACTCGCCTCTTGCTAGCTGCTTCACAATTTTCTTATGAATTGCATGATGCTAGTGGT 
GCAGAAAATTATCTCCTTACTGCAAAAGAAGACGCTGAGGATACAGAAGAAATCTTGCTT 
CGTTTAGCCACTATTTATCTGGAGCAGGAGCGTTATGAGGATATTCTAGACTTGCAGAGT 
GAGGAGCCAGAAAATCTTTTGACCAAGTGGATGATTGCTCGTTCTTATCAAGAAATGGAC 
GATTTGGATACTGCTTATGAGCATTATCAAGAGTTGACAGGAGATTTGAAGGACAATCCA 
GAATTTCTGGAACACTATATCTATCTCTTGCGTGAATTGGGACATTTTGAAGAAGCAAAA 
GTCCATGCTCACACTTACTTAAAACTGGTTCCAGATGATGTGCAAATGCAAGAACTGTTT 
GAGAGATTGTAAGAATGTTTAAACATATAGAACTGTAGTTTATCTCTTTTGATAGCTACG 
GTCTTTATTTGTACATGGTAGAATCTTTTTACAAAAATACTTGGTAATCTTGTTTATTCA 
TGCCATAATAG 

ORF Predictions : 

ORF # Start End Direction Length 



1 687 1055 F 123 aa 

2 979 1932 F 318 aa 

> [SEQ ID NO: 157] 3864548-2 ORF translation from 687-1055, direction F 

VRKSRVl^SQQMLQALEEQDLTKAEHYFAKALENDSSDLLYELATYLEGIGFYPQAKEIY 

LKIVEEFPEVHLNLAAMASEDGQIEKAFNYLEEIQADSDWYVSLFGSEGRPIPAGRFDRC 

GT* 

Description : 
unknown 

>[SEQ ID NO: 158] 3864548-3 ORF translation from 979-1932, direction F 

VTGMSRSLALKADLYQLEGLTDVAREKLLEALTYSKDSLLILGLAKLDSELENYQAAIQA 

YAQLDNRSIYEQTGISTYQRIGFAYAQLGKFETATEFLEKALELEYDDLTAFELASLYFD 

QEEYQKATLYFKQLDTISPDFEGYEYGYSQALHKEHQVQEALRIAKQGLEKNPFETRLLL 

AASQFSYELHDASGAENYLLTAKEDAEDTEEILLRLATIYLEQERYEDILDLQSEEPENL 

LTKWMIARSYQEMDDLDTAYEHYQELTGDLKDNPEFLEHYIYLLRELGHFEEAKVHAHTY 

LKLVPDDVQMQELFERL * 

Description : 
unknown 

Assembly ID: 3864582 
Assembly Length: 1318bp 
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>[SEQ ID NO: 62] 3864582 Strep Assembly — Assembly id#3864582 

CTTTAGCAATCAGTTTATTGGGAGATTTGACTGCCACTTCTGTTGGAACCTTGATAATCT 

TTTTACCCTCAAAGCGTTCCATACCAGAAATCTTAACATCAACTGCTAAAATAACTACAT 

CCGCTGCATCAATCTGCTCTTGACTCAATTCATTTTCTACCCCTATTGTCCCCTGAGTCT 

CAACATGAATCACATGTCCAGCTACCTTTGCGGCATTCTCTAATTTTTCCTGTGCAATAT 

AAGTGTGGGCAATTCCCATAGTACAAGCTGCAACACCAACAATTTTCATACGGATACCCT 

CCAAAATTTTTTCTTATTAACAAAAAGCTGCAATCACATCATCAGATGTCTGAGCCCGAA 

CTAATTTGGCAACAACTTCGTCATTACCAAGTTTTCGAGCAAAGAGTGATAAGGTCTTCA 

AATGCTCCCTAGCAGCTTCTGTATCATCACCAACTGCAAAGAGTACAATTACTTTGACCC 

CTTTCCCATCAATGGTCTCCCAAGGAATCTCATTGTGATTTATAGCTATGACTACCCCCG 

CCTTCTCCACAGCAGAACTCTAGCTATGGGGAATAGCAATATAATTCCCAATACCGGTCT 

GTCCTTCTGCCTCTCTCTGATAAAGACCTTCGATAAATTGGTCTCTATCAGACACATAAC 

CCGTCTCAACCAATAGTATGAGCTAATGCCTCAAAAACCTCTTCTTTGCTCTGCATCTGT 

AAATCCGTCTGGATCAGACTCACATTAAGAATATCTTTGATTTCCATATATTATCTCCCG 

TAATTCTTCTTTTGTTAACTGTTTTAATTGATTTATGAATGATTCATCTGCTAGTCTTCT 

CATCAATGTTTTAATACATGACTTGTCCTGTGATACTGCAATGGCCAAACCGATAATAAG 

GTCAACACACTGGATATCCTTCGACCATTCTCTGATAGGTGGTTTTAATCTAGTAATCAC 

TAAGACATGATGTTGAAAGTTTCCTTCACAATGTGGTAGAAGAACACCTTTAGCAACCTC 

TATACTTCCCTGTCTCTCACGGTAATATAGAAGCTCTTCTATTTTTTCTGTATCTTCAGA 

AACAAGAAGGCTGATTTGATTTGCTAATTCTTTGTAGGCTTCTTGACGATTTTGAACAGA 

TATATCCATAAGGACAAGCGAAAGATTATTCATAGTTTATCTCCTGAATTTTTGCTTGAA 

GACGTTGTTTATCACCCTCGGTTAGAAAAGCACTAACTAGGACAAACGGGACACTTGCTG 

GTTCCTGCAAAGCTACCGTCGTCACAATGAAATCTAAATCTGGATATAGATTTATCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 317 550 R 78 aa 

>[SEQ ID NO: 159] 3864582-1 ORF translation from 317-550, direction R 
VEKAGWIAINHNEIPWETIDGKGVKVIVLFAVGDDTEAAREHLKTLSLFARKLGNDEW 
AKLVRAQTSDDVI AAFC * 

Description: 

Probable phosphotransferase enzyme Ila component 

Assembly ID: 3864604 
Assembly Length: 2 077bp 

>[SEQ ID NO: 63] 3864604 Strep Assembly — Assembly id#3864604 
CTAGTCTTGGCTACTGTCTAAGTTGGCTTGTGCATAAGCCTGCCAGATTTTTTGTTGGGG 
TTTGGCAAGTGGGTAATTCTTGAATTCTTCTGGTGAAAGCCAACGAACTTCCCTATCTGA 
AAAATCATGGAAGTCACTCACCTGACCTGCTACAATCTGTACATGCCATTTTCGATGACT 
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AAAAACATGCTGGACTGTATCAAAACAAACATCAAGCCAATCAACATCTAGGTCATAGTC 
CTGCTGGAAACTCTCTTCTGGGACTGGGGCCAGAGTTCACACTTTCTTCCGCAACCTGAT 
GAAAGAGGTCAAACTGCTCTTCTTGCGAAAAGTTATCAACTTCTATAAAGGGGAAATGCC 
AAAAACCTGCCAAGAGCTTTTCGCTTTCATTTTTTTCAAGTAAAAATTGTCCTTGAGAAT 
TTTTCACAACTAAGGCTTTAAGATAAATAGGAACCGGCTTTTTCTTAGGAGATTTAATTG 
GATAACGGTCCATGGTTCCATTCTGATATGCCGCACTAAAGTCCTTGACTGGGCTTTCTT 
CAGGTCTGGGATTTACAGGAGACTCAATATCAGACCCTAAGTCCATCAAGGCTTGATTAA 
AATCACCCGGACGATCTGGATTAATCAAGATCTCCATCATTGCCTGAAAAATTTTTCGAT 
TACTTGGAATCCCAATATCGTGGTTGACTTCAAACAGACGCGCCAAGACCCGCATGACAT 
TACCATCTACAGCTGGCTCAGGCAAGTTAAAAGCAATACTGGAAATGGCTCCTGCTGTGT 
AAGGTCCAATCCCTTTCAAGCTGGAAATTCCTTCATAGGTATTTGGAAATTGGCCACCAA 
AGTCAGTCATAATCTGCTGGGCTGCAGCCTGCATATTGCGAACTCGAGAATAATAACCCA 
AGCCCTCCCAAGCTTTCAGTAAACTCTCCTCAGGCGCAGTTGCCAGACTTTCGACAGTTG 
GAAACCAGTCCAAAAATCTTTCGTAGTAAGGGATAACTGTATCCACCCTGGTCTGCTGAA 
GCATGATTTCAGATACCCAGATGTGATAAGGATTTTTACTTCTCCTCCAAGGCAAATCTC 
TTTTGTTTTCATCATACCAAGCGAGAAGTTTTCTCACCGGAAAGAAATGACTTTCTCCTC 
CGGCCACATGACGATACCGTATTCTTTCAAATCCTAACATATCTCTAGTTATAACACAGA 
AGGTTTCACCTGTCTTTGTATCTGATTTATAATATTTTCAATAGATAGTATATAACTTTT 
CCTATCTACTTATACTCCAATGAAAATCCAAAGAGCAAACTAAGAAGCTAGCCGCAGGTT 
GCTCAAAACACTGTTTTGAGGTTGTGGATAGAACTGACAGAGTCAGTATCATATTACCTA 
CGGCAAGGTGAAGCTGACGTAGTTTGAAAAGATTTTCGAAGAGTATAAATCTTATTGATG 
AACTGCTTGCAGTCTGAGAAAAAATGAGCTTGGATATTATTTCCAAACTCACTTAAAGTC 
AATTTCAATCCACTAGAACAAGCCTAGTACAGTTCCATCGCTTTCAACATCCATGTTGAG 
AGCTGCTGGACGTTTTGGAAGACCTGGCATGGTCATAACATCACCAGTTAAGGCAACGAT 
GAAGCCTGCACCTAATTTTGGTACCAATTCACGAATGGTAATTTCAAAGTTTTCTGGTGC 
TCCAAGCGCATTTGGATTGTCTGAGAAACTGTATTGAGTTTTAGCCATACAAATTGGCAA 
TTTGTCCCAACCGTTTTGAACGATTTGAGCAATTTGTGTTTGAGCTTTCTTCTCAAAGTT 
CACTTTGCTACCACGATAGATTTCAGTGACAATTTTTTCAATCTTTTCTTGGACAGAAAG 
GTCATTATCGTACAAACGTTTATAGTTAGCTGGATTTTCAGCAATTGTCTTAACAACTGT 
TTCGGCAAGTGCTACTCCACCTTCTGCTCCATCAGCCCAGACACTAGCCAATTCAACTGG 
TACATCGATTGAGGCACAGAGTTCTTTTAAGGCTGCAATTTCAGCTTCTGTATCAGATAC 
AAATT C GTTAAT AG AT AC AAGC T AATGGAAT ACC G AA 

ORF Predictions: 

ORF # Start End Direction Length 



1 1 141 R 47 aa 

2 1513 1803 R 97 aa 

>[SEQ ID NO: 160] 3864604-1 ORF translation from 1-141, direction R 
VSDFHDFSDREVRWL S PEEFKNYPLAKPQQKI WQAYAQANLDS SQD * 

Description: 
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unknown 

>[SEQ ID NO: 161] 3864604-3 ORF translation from 1513-1803, direction R 
VNF EKKAQTQ I AQ I VQNGWDKL P I CMAKTQ Y S F SDNPNALGAPENF E I T I RELVPKLGAG 
FIVALTGDVMTMPGLPKRPAALNMDVESDGTVLGLF* 

Description : 

FORMATE — TETRAHYDROFOLATE LIGASE (EC 6.3.4.3) ( FORMYLTETRAHYDROFOLATE 
SYNTHETAS E) (FHS) (FTHFS). - CLOSTRIDIUM ACIDI-URICI . 

Assembly ID: 3864610 
Assembly Length: 1887bp 

>[SEQ ID NO: 64] 3864610 Strep Assembly — Assembly id#3864610 

CTCAAAACNCTGCTTTGAAGAGATTTTCAAAGAGTACAAGAAGTTTAGTTATTAGCGTTC 

TTACCGCTTGTAAACTAGATTTCTCATAAAATAGAATCTTTTCCTTTTAGTTGTAAACTA 

GTCTGGGAGAGTAGAGAGGTTTGAGATACCTTTCTAGCTTTTGGATTATCATCTAAGAAG 

AGTAATTTCCCTTGCATTAAAAAGGGGAAAAAGAGACACGAAATGACTATAATGGGTGAC 

AATGGGGGAAGGGATAGACAAGAGATTTTATCCACATATGAAAAAAGGAGGTTAGGAAAG 

AGTTATATATCCTATATTATATAAATAATCAATTGCGCAGAAATTTGGTAAGAATTCATG 

CGTCAACTCATAAAGAACTACTTAAAAAATTCACAGTATTCATAATTATTTTCGAGGAGA 

AAAACAGTGAAAAAAAGAAAAAAGCTTGCTCTGTCTCTTATCGCTTTTTGGCTGACGGCT 

TGTTTAGTAGGCTGTGCTAGCTGGATTGATCGTGGAGAATCCATAACGGCTGTTGGCTCA 

ACTGCCTTGCAACCCTTGGTTGAAGTAGCGGCAGATGAATTTGGCACCATCCATGTTGGA 

AAAACGGTCAATGTCCAAGGGGGAAGTTCTGGTACAGGCTTGTCCCAGGTTCAGTCTGGG 

GCAGTTGATATAGGAAACTCAGATGTATTTGCTGAGGAAAAAGACGGAATTGATGCTTCT 

GCTCTTGTTGACCACAAGGTCGCGGTAGCTGGCTTGGCTCTGATTGTCAATAAGGAGGTT 

GATGTTGATAACCTAACGACAGAGCAACTTCGTCAAATCTTCATAGGTGAGGTAACCAAT 

TGGAAAGAGGTTGGTGGTAAGGACTTACCCATCTCTGTTATCAATCGGGCAGCCGGCTCT 

GGCTCTCGTGCTACCTTTGATACTGTCATTATGGAAGGTCAGTCTGCCATGCAAAGTCAG 

GAGCAGGATTCAAATGGAGCGGTAAAATCAATCGTATCAAAAAGTCCAGGAGCTATCTCT 

TATTTATCTCTTACCTATATAGATGATTCGGTCAAAAGCATGAAGTTGAATGGCTATGAC 

TTAAGTCCAGAAAATATAAGTAGCAATAATTGGCCCTTGTGGTCTTATGAGCATATGTAT 

ACATTGGGGCAGCCCAATGAGTTGGCTGCAGAATTTCTCAATTTTGTTCTCTCGGATGAG 

ACCCAAGAAGGGATTGTCAAAGGATTGAAGTATATTCCGATTAAGGAAATGAAGGTTGAA 

AAAGATGCTGCCGGAACTGTGACAGTGTTGGAAGGGAGACAATAATGAATCAAGAAGAAT 

TAGCTAAGAAAATGTTGCTTCCATCAAAGAATTCTCGTCTGGAGAAATTAGGAAAAGGTT 

TGACCTTTGCCTGTCTTTCTTTGATAGTCATCCTTGTGGCCATGATTTTGGTTTTCGTAG 

CGCAAAAAGGCTTGTCGACCTTCTTTGTCAATGGTGTGAATATCTTTGACTTTCTTTTGG 

GAGGAACTTGGAATCCTTCTAGTAAAGAATTTGGTGCCCTTCCTATGATTTTGGGTTCCT 

TTATCGTTACCATTCTCTCAGCCCTTATCGCAACACCCTTTGCTATTGGTGCAGCAGTTT 

TTATGACCGAAGTATCACCAAAAGGGGCGAAGATTTTGCAACCAGCTATTGAACTCCTGG 

TTGGGATTCCTTCAGTAGTGTACGGATTTATTGGCTTGCAAGTCGTCGTTCCCTTTGTTC 
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GCAGTGTCTTTGGTGGGACTGGTTTTGGGATTTTGTCAGGGATTTCCGTCCTCTTTGTCA 
TGATTTTGCCGACCGTAACCTTTATGACAACGGATAGCTTGCGTGCGGTTCCTCCNTTAT 
TATCGTGAAGCCAGTTTCGCTATGGGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 427 1305 F 293 aa 

> [SEQ ID NO: 162] 3864610-1 ORF translation from 427-1305, direction F 

VKKRKKLALSLIAFWLTACLVGCASWIDRGESITAVGSTALQPLVEVAADEFGTIHVGKT 

VNVQGG S S GTGL S Q VQ SGAVD I GNS DVF AEEKDG I DAS ALVDHKVAVAGLAL I VNKEVDV 

DNLTTEQLRQIFIGEVTNWKEVGGKDLPISVINRAAGSGSRATFDTVIMEGQSAMQSQEQ 

DSNGAVKSIVSKSPGAISYLSLTYIDDSVKSMKLNGYDLSPENISSN1SMPLWSYEHMYTL 

GQPNELAAEFLNFVLSDETQEGIVKGLKYIPIKEMKVEKDAAGTVTVLEGRQ* 

Description: 

PROBABLE ABC TRANSPORTER BINDING PROTEIN PRECURSOR (ORF108) . - 
BACILLUS SUBTILIS. (BLAST) 

Assembly ID: 3864716 
Assembly Length: 40 5bp 

>[SEQ ID NO: 65] 3864716 Strep Assembly Assembly id#3864716 

CTGAGGAATCAAAAGTTGAACCACCAGTAGAACAAGCATAAGTCCCAGAACAACCCGTGC 

AACCTACACAAGCTGAGCAACCAAGTACACCAAAAGAATCATCACAACAAGAAAATCCTA 

AAG AAG AT AGGGG AGC GG AAG AG AC T C C G AAAC AAG AAG ATG AAC AG C C AG C AG AAGC C C 

AAGAAATCAAGGTTGAAGAACCAGTAGAATCTATAGAGGAGACTGTCATTCAACCTGTTG 

AACAACCAAAAGTGGAAACGCCTGCTGTTTAATAACTAACGGAACCTACAGAGGAACCTA 

AAGTTGAAGTAACTAGTATTCCCCTCACTACTCGCTATGAGGAAGACCTTACTTACGAAC 

ACGGAACGCGTTGAAGTTGTTAAGGAAGGTTATAATTGGCAGTAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 57 272 F 72 aa 

> [SEQ ID NO: 163] 3864716-1 ORF translation from 57-272, direction F 

VQPTQAEQPSTPKESSQQENPKEDRGAEETPKQEDEQPAEAQEIKVEEPVESIEETVIQP 

VEQPKVETPAV* 

Description: 
unknown 
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Assembly ID: 3864718 
Assembly Length: 1542bp 

> [SEQ ID NO: 66] 3864718 Strep Assembly — Assembly id#3864718 

CTATGGGATTGGTAGTTCTTCCTAGTGCAGGGGCTGTAGACCCAGTTGCGACCCTAGCGC 

TGGACTAGTCGAGAGGGTGTTGTTGAAAATGGATGGCTATCGCTATGTTGGTTATCTATC 

AGGTGACATCCTCAAAACGCTTGGCTTGGACACTGTTTTAGAAGAAACCTCAGCAAAACC 

TGGAGAGGTGACTGTAGTCGAAGTTGAGACTCCTCAATCAACAACAAATCAGGAGCAAGC 

TAGGACAGAAAACCAAGTAGTAGAGACAGAGGAAGCTCCAAAAGAAGAAGCACCTAAAAC 

AGAAGAAAGTCCAAAGGAAGAACCAAAATCGGAGGTAAAACCTACTGACGACACCCTTCC 

TAAAGTAGAAGAGGGGAAAGAAGATTCAGCAGAACCATCTCCAGTTGAAGAAGTAGGTGG 

AGAAGTTGAGTCAAAACCAGAGGAAAAAGTAGCAGTTAAGCCAGAAAGTCAACCATCAGA 

CAAACCAGCTGAGGAATCAAAAGTTGAACCACCAGTAGAACAAGCAAAAGTCCCAGAACA 

ACCCGTGCAACCTACACAAGCTGAGCAACCAAGTACACCAAAAGAATCATCACAACAAGA 

AAATCCTAAAGAAGATAGGGGAGCGGAAGAGACACCGAAACAAGAAGATGAACAGCCAGC 

AGAAGCCCAAGAAATCAAGGTTGAAGAACCAGTAGAATCAAAAGAGGAGACTGTTAATCA 

ACCTGTTGAACAACCAAAAGTGGAAACGCCTGCTGTAGAAAAACAAACGGAACCAACAGA 

GGAACCAAAAGTTGAAGTAACAAGTATTCCCCAAACTACTCGCTATGAGGAAGACCTTAC 

TAAGGAACACGGAACGCGTGAAGTTGTTAAGGAAGGTAAGAATGGCAGTAGAACAGTTAC 

TACTCCATATATCTTGAATGCGACAGATGGTACGACTACAGAAGGCACTTCGACAACTGA 

TGAAGCTGAGATGGAGAAAGAGGTTGTTCGTGTTGGCACGAAACCCAAAGAAAAATTAGC 

TCCAGTCTTAAGTTTGACAAGTGTTACAGATAATGCAATGTTGCGTAGTGCGAGACTTAC 

TTATCATTTGGAAAATACAGATAGTGTTGATGTGAAAAAAATTCATGCTGAAATTAAAAA 

TGGCGATAAGGTTGTCAAAACTATTGACTTATCTAAAGAGAGATTATCAGATGCTGTTGA 

CGGTCTTGAACTTTATAAAGATTATAAGATTGTGACGAGTATGACCTATGATAGAGGTAA 

TGGTGAAGAAACCTCTACGTTGGAAGAAACTCCACTACGATTAGACCTCAAGAAGGTTGA 

ATTGAAAAACATCGGCTCTACTAATCTCGTCAAAGTAAATGAGGATGGTACTGAGGTGGC 

AAGTGACTTCTTAACAAGTAAACCTGTGGATGTGCAGAATTACTACCTCAAAGTAACTTC 

CCGTGATAATAAAGTTGTTTCCCCTCCCAGTTGAAAAAATTGAAGAGGTGACTGAGGAAG 

GTCCACCACTTTACAAAGTCCCTGCTAAGGCCCTAATTTGAT 

ORF Predictions: 

ORF # Start End Direction Length 



1 77 1474 F 466 aa 

>[SEQ ID NO:164] 3864718-1 ORF translation from 77-1474, direction F 

VLLKMDGYRYVGYLSGDILKTLGLDTVLEETSAKPGEVTWEVETPQSTTNQEQARTENQ 

WETEEAPKEEAPKTEESPKEEPKSEVKPTDDTLPKVEEGKEDSAEPSPVEEVGGEVESK 

PEEKVAVKPESQPSDKPAEESKVEPPVEQAKVPEQPVQPTQAEQPSTPKESSQQENPKED 

RGAEETPKQEDEQPAEAQEIKVEEPVESKEETVNQPVEQPKVETPAVEKQTEPTEEPKVE 

VTSIPQTTRYEEDLTKEHGTREWKEGKNGSRTVTTPYILNATDGTTTEGTSTTDEAEME 
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KEWRVGTKPKEKLAPVLSLTSVTDNAMLRSARLTYHLENTDSVDVKKIHAEIKNGDKW 
KTIDLSKERLSDAVDGLELYKDYKIVTSMTYDRGNGEETSTLEETPLRLDLKKVELKNIG 
STNLVKVNEDGTEVASDFLTSKPVDVQNYYLKVTSRDNKWSPPS* 

Description: 
unknown 

Assembly ID: 3864802 
Assembly Length: 132 lbp 

>[SEQ ID NO: 67] 3864802 Strep Assembly Assembly id#3864802 

ATCGAATTACTTCAACTCCAACTTTACTCTCAATAAAAATCAAATGTAAAAAGAGGAGCT 

AAATTTATCTTTTTCTCCTCCTTCATCGTTCTTACTTTTGACCATAATAAGCATTTGGTC 

CATGTTTACGTTGGTAGTGTTTTTCTAGTATGTACTGGGGAGCAGGTTCAACTCTTGGAT 

TGATTTGTTCTGTAAAGCGATTCATCTTTGATACTTCCTCTAGTACGACAGAGTGATAAA 

CAGCATTCTCTGGATTTTTGCCCCAGGTGAATGGACCGTGATTGCGTACAACAATTCCTG 

GTACTTCAACCGGGTTAAGTCCGCGATGTTCAAACTCTTCTACGATAACCAGGCCAGTAT 

CTTTTTCATAGGCCACTTCTACTTCGTCCTTGGTCAAACTACGGGCGCAAGGGATTGAAC 

CGTAGAAATAATCTGCATGGGTTGTTCCGTAGAAAGGAATATCACGACCTGCCTGAGCCC 

AAGCAACAGCTTCTGTCGAATGGGTGTGAACCACACTACCAATTTCTGACCAAGCCTTAT 

ATAATTGCACATGAGTTGGGAAGTCGGAAGATGGTCTTAAATCCCCTTATAGGATCTTAC 

CATCTAGATCAGTCACTACCATGTTTTCAGGTGTCAATTCGTCATAATCCACGCCTGATG 

GTTTGATAACAATGACACCGAGTTCGCGATTGACTTCAGATACATTCCCCCAGGTAAATT 

TGACAAGTCCATGTTTTGGCAATGATTGATTGGCATCACAGACTCGTTTACGCATAGCAT 

TGATTACTTGATTCATCTTACATCAAACCTGCTTTCTTAATGAGTGGATAGAGAAAAGCT 

TGCGCCTCTTGAATGGCTGCGCGTGTTTCTTCTACTGTTTCACAATTTTCAGACCACATT 

TCGATTAGGAAAGGTCCATTATAATTGGTTTCCTTTAAAATATCGAAAGCTTCTTCCCAT 

TTGACACAACCTTGCCCAAAAGGTACATCTCGGAACTGGCCCTTTGAACTTTCTGTCACT 

GCATAAGTATCCTTGAGATGGAGAGTTGCGATGGCATGATGACCAAGATAAAACTCACTA 

TAGATATCATTATGCCATGCAGACACATTACCAATATCTGGATATACAAAGAGGAAGGGA 

GAGTCAATCTCTTTTTCTATAGCCAAATATTTTTCGATGCTATTGATGAAAGGATCATCC 

ATAATTTCAATAGCAAGTACCACCTGAGCTTCTTCAGCCCAGTCACAGGCTTTTCTCAAA 

TTTTTGATAAAACGTTGGCGTGTCTGGGGTGACTTTTCCTCATAGTAAACATCGTAACCA 

G 

ORF Predictions: 

ORF # Start End Direction Length 



1 92 550 R 153 aa 

>[SEQ ID NO: 165] 3864802-1 ORF translation from 92-550, direction R 

VQLYKAWSEIGSWHTHSTEAVAWAQAGRDIPFYGTTHADYFYGSIPCARSLTKDEVEVA 

YEKDTGLVIVEEFEHRGLNPVEVPGIWRNHGPFTWGKNPENAWHSWLEEVSKMNRFT 
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EQINPRVEPAPQYILEKHYQRKHGPNAYYGQK* 
Description: 

L-RIBULOSE-5-PHOSPHATE 4 -EPIMERASE (EC 5.1.3.4). - ESCHERICHIA COLI . 

Assembly ID: 3864854 
Assembly Length: 12 65bp 

>[SEQ ID NO: 68] 3864854 Strep Assembly Assembly id#3864854 

TTTTTCTGTTTTTCGGAGCAAACTGGGCTCCAGCCGGTTTTGGCCTTCTTTCCTTAGCTA 

CAGCTGGTTTAGCTGGCTCAGATTTTTCGGCTTTCTTTTCTGCACTTACTTTTGGTGCTG 

CAGGTTTTGCTTCTACTTTCGGAGCAGCTGCAGGCTTAAAGCTGGCAGCAATTTTTGCAG 

CGACAGCTTCTTCCACACTTGATGAGTGGCTTTTCACATCCAAGCCCAACTCTTTTGCAC 

GCGCTACAACTTCTTTACTTTCTTTTCCAAGTTCTTTTGCGATTTCGTACAATCTTTTCT 

TAGACAAATCATGTCCTCCTCTTCTATTCCATAAGAGACCTCATTTTCTTTGTAAATCCA 

GCATCTGTTACAGCCAAAACCTTTCTCGATTTCCCGACTGCTATGATTAATTCCAGTGTT 

GAAAACACGGTTACAATTTCTACTTGATAATAATGACTTTTATCTTGAATCTTCTTGGTC 

AGATTGGGTCCAGCATCATGAGCTAGAAAGACCAACTTGGCCTTGCCGTCTTGAATGGCC 

TTGACCACCAATTCTTCACCCGATATGATGCGCCCTGCTCGCTGAGCAAGCCCCAAGAGA 

TTACTTATCTTTTGCTTATTCAAGTCCCAACTCTCTTCTTTTCACTTTGTGATCCACATA 

AGCGATCAACTCGTCATAAAAGCTTTCTTCCACTTCCATGCTAAAGCTGCGGTTAAAGAC 

CTTCTTCTTTTTCGCCTCTAGGGCTTCTGCATTGTCTAGTTTGATATAAGCGCCGCGGCC 

ATTGGCCTTGCCCGTAGGATCAATAAAGACTTGTCCTTCCTTGTTCTTGACAATGCGGAG 

CAAATCACGCTTATCAATCACTTCGTTAGACACAACAGACTTGCGCAAAGGGATTTTTCT 

TGTTTTCATCTTTCCCTCCTCTAGCAGCTTTTATTCTTCTACAGTATCGTTTTCTACTTC 

CAACTCTACTGAAGCAGCGTCTTCCATGGCTTCAAATTCGCTAGCAGACTTGATATCGAT 

ACGGTAACCAGTCAAGTGAGCCGCCAAGCGCACGTTTTGTCCACGACGACCAATGGCAAG 

AGAAAGCTTGTTATCTGGAACAACCACCAAGGCACGTTTGCTGTCGTTTTCATCAAAGAT 

AACTTGGTCAACCTCAGCAGGAGCGATGGCATTGTAGATAAATTCAGCTGGATCTGCTAC 

CCACTCGATAACATCGATATTTTCTTCGATTGGTACCATGCGGTCATTTTTAGCATCGTA 

AC GAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 324 548 R 75 aa 

>[SEQ ID NO: 166] 3864854-1 ORF translation from 324-548, direction R 
VVKAI QDGKAKLVFLAHDAGPNLTKKI QDKSHYYQVE I VTVF STLEL 1 1 AVGKSRKVLAV 
TDAGFTKKMR S LME * 

Description: 
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PROBABLE 11.1 KD RIBOSOMAL PROTEIN IN NUSA-INFB INTERGENIC REGION 
(ORF4) . - BACILLUS SUBTILIS . 

Assembly ID: 3864862 
Assembly Length: 13 05bp 

> [SEQ ID NO: 69] 3864862 Strep Assembly — Assembly id#3864862 

ATAAACCAAAGGAAGCTGAGCTCTTTAGTCCCAGCTTCTTTTTATATATAAAATTTTACC 

CGTGAAAAGACAGGGCCTTAGCAGACTTCTTTTTTACTTCGTTCACCCTTGCTTTTTCTT 

TGTATGTTTGGGCGTTGGCAGTTGGTTATACATAGCTAAAATCAGGTCTTATAGAAACAT 

CTTATTATCAAGTTCTTCCACTCAAATCATTTCTTTGGCACCTTTGTATGGAAACTCAAA 

AGAAGATTGGTCAATCTTATCTAAGACTGCTTGCACGGGTTTAACTAAAAGCGATCGTCA 

TAAATGCCGCCAATAATCTTGCCGCGGAAGTAAAGAATATACTCCCCCATCATGGAACGG 

TAAGTCACATCATCTAATCCTGATAATTGTTCCAAAACAAATTCCAAATAGTTCTTACTT 

GATGCCATTTCTAATCTTCTAGGCTCTGTTCAACGATAACAACCGTATAGAGTTCTTGCT 

TAACCTCGCATCCAATTGATTTAAAGCCCTGCTTTTCCCAAAAATGCTGAGATTGCGGAT 

TTCCCTTAACATAAGCCAAACGTGCCTTTCGAAAGTTCTTAGCAAAATAAGCTAGTGCTT 

CTGTCACAATATGACTACCAATCCCTTTCCTCTGATAGGCTTGATCAACCATAAACAAAC 

CAATAAAAACAGTCTCCTCATCAGGATATGCATAGACAAAATCCATAACAGCCACAAGGT 

CAAATCCATTCCAAAATCCAACAAAAAACTTATCAGCCTTAGCTTTACCTTCAGGTAGAC 

AAAGCATGTCCTCTTTTACAGTTGCAAAATTTGGCTCTGGTGGACAATGCTGAAAATACA 

GAGGATTACTTTCATATAAAGATAAAATACTTGGAATATCCTTTTCAGTTAGTATCCTAC 

AACTGTAATACTTAGATAGTTGGTCAATCATCTTTTCAAATTCGATACTTTCTTGTGCCC 

TGTGATTATGACACAGGAAGATGCACTGATCGTCATCAGCCACATAAAAGTTCTTTCCAT 

CGTGCCTAATCGTTGTCTCAAACCTTTGGATAAAACCTTTAGCCTATACAACTGGATTTT 

CCTCTCTCAAAAGTATATTCTTTTGCAGGCGAACTTCCTCAAAATCAGTCGTGTGCAACT 

TCAGTAGAATATTCATAGGCTCGGATAATCTGAGCGACAACAGGATGGCGAACCACATCC 

TTGGCTGAAAAATGAACAAAGTCAATCTGATGGATGTTCTTGAGTTTCTCTTGAGCATCA 

ATCAAACCGGACTTGACATTACGTGGCAGGTCAATCTGACTAATA 

ORF Predictions: 

ORF # Start End Direction Length 



1 431 1003 R 191 aa 

>[SEQ ID NO: 167] 3864862-1 ORF translation from 431-1003, direction R 

VADDDQCIFLCHNHRAQESIEFEKMIDQLSKYYSCRILTEKDIPSILSLYESNPLYFQHC 

PPEPNFATWEDMLCLPEGKAKADKFFVGFWNGFDLVAVMDFVYAYPDEETVFIGLFMVD 

QAYQRKGIGSHIVTEALAYFAKNFRKARLAYVKGNPQSQHFWEKQGFKSIGCEVKQELYT 
WIVEQSLED* 

Description: 
unknown 
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Assembly ID: 3864888 
Assembly Length: 1742bp 

>[SEQ ID NO:70] 3864888 Strep Assembly — Assembly id#3864888 

CTAATCTCCTTAAAACGTGATCTTTTCAAGAATATTTTTATCTAAACAATCCAGCAAGTC 

TTGGTAAGAATAGACTTCGTAAGTCGGCTGGGCTTGTGTGTGATTTTCGAGGTGATGAGG 

ATTATACCAGATAGTGTCAATCCCCGCATTATTGCCACCTTGAATGTCGGCGGTTAGAGA 

ATCTCCAATCATCAGCGTCTTTTCTTTACTAAATCCAGCAATTTGCTGGCCAATCTTTTC 

ATAAAAAAGAGCATCCGGCTTTTGAGTTTGCAACTGTTCTGAGATAAAGACTTGATTGAA 

ATAAGGTGCTAGACCAGATTGAGCCAAACGTCCTGTCTGAATGGCAGTAATGCCATTTGT 

CGCAGCATACAAGTTATAATCACGCTCAATGAGGCTGTCCAAGAGATCATGAGCGCCCGA 

TAGTGTTTGTCCCTGCTGGGCGAGGTAAAATTGGTAACGCTGGGCAAGAAAACTACCGTC 

TTTTTCCTGTCCAAAATGAGCAAATAAACGAGAAAAGCGCGTGTTAACCAGCTCTTGTTT 

ACTGATTTTCTTCAGCTCCAAGTCTTTCCAGAGAGCCTTGTTCATAGGAACGTAATAATC 

TTTATAAGCCGGAATATCCGCAACTCCTTCTTCTTTTAGAAGTGGAGTCAAAGCCACATC 

CTCAGCAGCATCAAAATCAAGAAGAGTGTGGTCGAGGTCGAAGAGTACAAATTTGTAGAA 

CAATTTGAGGTTTTCCTTTCTGAAAATTCATTAAGAACATTATATCATAAAGCACCTCAT 

ACAATTAACTAATTTAATCACTTAAAAAAAATTCGAACACTTTCTATACAACTGACAGCT 

C AAAT C TTTC AG AAT AG AAC AAT AC T AAC T ATC G AAC AC C C C GT C TTC AT AAAT AC AT AT 

GTAATTCTAGGCCTAGAATTCCTATAAACTAAATGCTTTCATACTCTTCCAAGTAATTGA 

TTGCCTTAAATTTTAATTTTTGAAGGTTTCTAAAGCTAGAATAGCCCCATCACAATCAGT 

TTTGATTGATTCACAATTTAGAAACACTATAGTTTCACTCCTGTTAAAATAAAAAGGAAC 

TGCATAAAGCAATCCCTTTCTGATTTTGAAATCATTTACTTAACATTTTATAGTTGAGAT 

AATCAATAGCTTATCTATAAAAAGAGTTATAGTAAAATTCCTTATTTATTGATTCCAAGC 

TCCGCTAACTGTATTTGAATAACTGACAGTTCTGCACCAGCCTGAAAAAGAGCAGCTGCA 

TTATAGGCACCTTCTACAATTGGAACCCTGTTGATGATGATACTTTTATCACTGAAATCA 

GTCACCATTTTTAAGTTCATTTTAGCAGAACCTAGGTCAAAAAAGGCAAGTAAAGTATCT 

GCTGGATTTTCGGAAACAACCCTATCTACTTGATCAAAACTCGTTCCAATTCCTCCGCCC 

TCGGTTCCTCCTACATAAGTAATCGGAACATCTTTAGCTACTTTACTAATCAGTTCAACA 

ACACCTTCTGCAATGTGTTTGGAATGTGAAACGATAACAAGACCAATACCAATACTTTCC 

ATCAAACCACTCCAGTTTCTAAAATAGCAGTAAAGAGTAATCCTGATGAGAATGATCCAG 

GATCAATATGTCCAAGAAACCACATGCTCCTAAGACAAGAGCTAACAGACTGGCCATCAA 

TAATAGTATTGTTCTTTTTTTCATCATTACTCCTTAACTAGTGTTTAACTGATTAATTCG 

AT 

ORF Predictions: 

ORF # Start End Direction Length 



1 10 657 R 216 aa 

>[SEQ ID NO: 168] 3864888-1 ORF translation from 10-657, direction R 
VALTPLLKEEGVADIPAYKDYWPMISrKALWKDLELKKISKQELVNTRFSRLFAHFGQEKD 
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G SFLAQRYQFYLAQQGQTLSGAHDLLDSL I ERDYNLYAATNG I TAI QTGRLAQ S GLAPYF 
NQVF I SEQLQTQKPDALFYEKIGQQ I AGF SKEKTLMI GDS LTADI QGGNNAG I DT I WYNP 
HHLENHTQAQPTYEVYSYQDLLDCLDKNILEKITF* 

Description : 
unknown 

Assembly ID: 3864898 
Assembly Length: 113 6bp 

>[SEQ ID NO:71] 3864898 Strep Assembly -- Assembly id#3864898 

GTGGAATGCGGGGACGCCTTGTCTAATTTTGGATCAAGCCCTGAGTTTGACACAGGGAAA 

TGAGCTGGACGGACTGCTATCTCTGAAGAAATTACTGGCACCATTAGCCTATCAGCCTTG 

GATGATTATGTGGCGGCCTTGTCTCAACAGGATGTTCCCAAAGCTTTGTCTTGCTTGAAT 

CTTCTTTTTGACAATGGTAAGAGCATGACTCGTTTTGTGACCGATCTTTTGCACTATTTA 

AGAGACTTGTTAATTGTTCAAACAGGGGGAGAAAATACTCATCATAGTTCAGTCTTTGTA 

GAAAATTTGGCACTTCCTCAAAAAAATCTGTTTGAAATGATTCGCTTAGCAACAGTGAAT 

TTAGCAGATATTAAGTCTAGTTTGCAGCCCAAGATTTATGCTGAAATGATGACCGTCCGT 

TTGGCGGAAATCAAGCCCGAACCAGCTCTATCAGGAGCGGTTGAAAATCGAATTGCTACG 

CTGAGACAGGAAGTTGCCCGTCTCAAACAAGAGCTTTCTAATGCAGGTGCGGTTCCTAAA 

CAAGTTGCACCAGCTCCTAGTCGACCAGCTACGGGCAAAACAGTCTATCGTGTCGATCGC 

AATAAAGTGCAATCTATCTTACAAGAGGCCGTCGAAAATCCTGATTTAGCACGTCAAAAT 

CTAATTCGTTTGCAGAATGCCTGGGGAGAGGTAATTGAAAGTCTAGGTGGGCCGGACAAG 

GCTCTGCTAGTTGGTTCTCAACCGGTTGCTGCCAATGAACACCATGCTATTCTTGCTTTT 

GAGTCTAACTTCAATGCTGGTCAAACTATGAAACGAGACAATCTCAATACCATGTTTGGT 

AATATCCTCAGTCAGGCGGCAGGTTTTTCACCTGAGATTTTAGCTATTTCCATGGAGGAA 

TGGAAAGAAGTTCGCGCAGCCTTTTCAGCCAAAGCCAAATCTTCTCAAACTGAAAAAGAA 

GTAGAAGAAAGCCTGATTCCAGAAGGATTTGAATTTTTGGCTGATAAAGTGAAGGTAGAG 

G AAG AC T AAAG AAAG ATTTC ATG AT AC AAT AAGTTT ATG AAT AAAC AAC AATTT ATT AT T 

ATGGCGCTATTTACAGCTGCTGAGACCTATTTTTTCAATGAAGCCTGGATGACTGG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 130 1029 F 300 aa 

>[SEQ ID NO: 169] 3864898-1 ORF translation from 130-1029, direction F 

VAALSQQDVPKALSCLNLLFDNGKSMTRFVTDLLHYLRDLLIVQTGGENTHHSSVFVENL 

ALPQKNLFEMIRLATVNLADIKSSLQPKIYAEIXMTVRLAEIKPEPALSGAVENRIATLRQ 

EVARLKQELSNAGAVPKQVAPAPSRPATGKTVYRVDRNKVQSILQEAVENPDLARQNLIR 

LQNAWGE VI E S LGG PDKALLVG S Q P VAANEHHAI LAFE SNFNAGQTMKRDNLNTMFGNI L 

SQAAGFSPEILAISMEEWKEVRAAFSAJCAKSSQTEKEVEESLIPEGFEFLADKVKVEED* 
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Description: 
unknown 

Assembly ID: 3864938 
Assembly Length: 167 0bp 

>[SEQ ID NO: 72] 3864938 Strep Assembly — Assembly id#3864938 

CTGTCTCTGAAACAGTCACATCAAGTGCCTCTGAACAANCGCCCCNCCTAGGTNGACGGT 

ATCGATAAGCTCGATCTGTGATTTCAGAGAAGAAATCAAGTGCTGTAACAGAAGTAAGAT 

GTAATTGTATGTAAAGGAGACGTCATGTTAAATAGTATTGTAACCATTATTTGTATTGCC 

CTTATCGCGTTTATCTTGTTTTGGTTTTTCAAAAAGCCTGAAAAATCTGGACAAAAAGCC 

CAGCAAAAAAACGGATACCAAGAGATTCGAGTGGAAGTCATGGGAGGCTATACTCCTGAG 

TTGATTGTCCTCAAGAAATCAGTGCCAGCCCGCATTGTCTTTGACCGCAAGGATCCTTCA 

CCATGTCTGGATCAAATTGTTTTTCCAGATTTTGGTGTACATGCGAACCTGCCAATGGGG 

GAAGAGTATGTAGTGGAAATCACGCCTGAACAGGCTGGAGAGTTTGGCTTTGCTTGTGGT 

ATGAACATGATGCACGGCAAGATGATTGTAGAGTAGGTGGAGACTATGACAGAAATTGTG 

AAAGCAAGCTTAGAAAATGGCATTCAAAAAATCCGTATCCGAGCTGAAAAAGGCTATCAT 

CCAGCCCATATCCAGCTTCAAAAGGGAATTCCAGCTGAGATTACCTTTCATTCGTGCTAC 

TCCTTCAAACTGTTATAAGGGAAATTCTGTTTGAAGAAGAAGGTATCTTGGAAGCAATCG 

GCGTAGATGAGGAGAAAGTCATTCGTTTTACACCTCAAGAATTAGGGAGACATGAATTTT 

CTTGTGGCATGAAGATGCAAAAGGGAAGCTATATAGTCGTTGAGAAGACTCGAAAATCTC 

TATCTCTCCTGCAAACGTTTTTGGATTACTAGTATCTTTACTGTGCCTCTTGTGATTCTC 

ATGATTGGGATGTTGGCAGGTAGCATTAGTCATCAAGTCATGCATTGGGGAACCTTTTTA 

GCAACAACGCCTATTATGTTAGTTGCGGGTAAGCCATATATCCAGAGTGCTTGGGCCAGT 

TTTAAAAAGCACAATGCCAACATGGATACCTTGGTTGCGCTGGGAACTCTAGTGGCTTAT 

TTCTATAGCCTAGTTGCTCTCTTTGCTGGTCTCCCTGTTTACTTCGAAAGTGCTGGATTT 

ATCCTCTTTTTCGTTCTTTTGGGAGCAGTTTTTGAGGAAAAAATGAGGAAAAATACGTCC 

CAAGCTGTGGAGAAATTACTGGACTTGCAAGCTAAAACCGCAGAAGTCTTGAGTGATGAT 

AGTTATGTCCAAGTTCCTTTGG7VACAAGTCAAGGTACGCGACCTTGATTCCAGTGCGTCC 

CGGTGAAAAGATTGCTGTTGATGGTGTCGTAGTAGAAGGTGTCTCTAGTATTGACGAATC 

CATGGTGACAGGTGAGAGTCTGCCTGTGGACAAGACAGTTGGAGATACTGTCATTGGCTC 

AACCATCAATCATAGTGGAACGCTTGTCTTTAGAGCAGAAAAAGTTGGCTCAGAGACTGT 

TTTGGCTCAGATTGTAGATTTTGTGAAGAAAGCTCAGACAAGTCGTGCGCCGATTCAGGA 

CTTGACGGATAAGATTTCAGGGATTTTTGTCCCAGTAGTTGTCATTTTAGGAATCATGAC 

CTTTTGGGTTTGGTTCGTCTTGCTCAGGGATAGTGTGGTCGTGCTTGGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 883 1326 F 148 aa 

>[SEQ ID NO: 170] 3864938-2 ORF translation from 883-1326, direction F 
VPLVILMIGMLAGSISHQVMHWGTFLATTPIMLVAGKPYIQSAWASFKKHNANMDTLVAL 
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GTLVAYFYSLVALFAGLPVYFESAGFILFFVLLGAVFEEKMRKNTSQAVEKLLDLQAKTA 
EVLSDDS YVQVPLEQVKVRDLDS SASR * 

Description: 
ATCS_SYNP7 

Assembly ID: 3864956 
Assembly Length: 12 52bp 

>[SEQ ID NO: 73] 3864956 Strep Assembly — Assembly id#3864956 

ACAAGAACAATTGGAACAGGTACAGGCTGTTAAAAAATCGATTAACACAGCTAGTGAAGA 

AGTGAAAAACCAAGTCTTGCTACCCATGGCTGATCACTTAGTGGCTGCTACTGAGGAAAT 

TTTAGCGGCTAATGCCCTCGATATGGCAGCGGCTAAGGGGAAAATCTCAGATGTGATGTT 

GGATCGTCTTTATTTGGATGCAGATCGTATAGAAGCGATGGCAAGAGGAATTCGTGAAGT 

GGTTGCCTTACCAGATCCAATCGGTGAAGTTTTAGAAACAAGTCAGCTTGAAAATGGTTT 

GGTTATCACAAAAAAACGTGTAGCTATGGGGGTCATCGGTATTATCTATGAAAGCCGTCC 

AAATGTGACGTCTGATGCGGCTGCTTTGACTCTTAAGAGTGGAAATGCGGTTGTTCTTCG 

TAGTGGTAAGGATGCCTATCAAACAACCCATGCCATTGTCACAGCCTTGAAGAAGGGCTT 

GGAGACGACTACTATTCATCCAAATGTGATTCAACTGGTGGAGGATACTAGCCGTGAAAG 

TAGTTATGCTATGATGAAGGCCAAGGGCTATCTAGACCTTCTCATTCCTCGTGGAGGAGC 

TGGCTTGATTAATGCAGTAGTTGAGAATGCCATTGTGCCTGTTATCGAGACAGGAACTGG 

GATTGTCCATGTTTATGTCGATAAGGACGCAGATGACGACAAGGCACTGTCTATCATCAA 

CAATGCCAAAACCAGTCGTCCTTCTGTCTGCAATGCCATGGAGGTTCTGCTGGTTCATGA 

AGACAAGGCAGCAAGCTTCCTTCCTCGCTTGGAGCAAGTGCTGGTTGCAGATCGAAAAGA 

AGCTGGGTTGGAACCAATTCAATTCCGCCTAGATAGCAAAGCAAGCCAGTTTGTTTCAGG 

TCAAGCTGCTCAAGCACAAGACTTTGATACCGAGTTTTTAGACTATATTCTAGCTGTTAA 

GGTTGTGAGCAGTTTAGAAGAAGCGGTTGCGCATATTGAATCCACAGTACCCATCATTCG 

GATGCTATTGTGACGGAAAATGCTGAAGCTGCAGCATACTTTACAGATCAAGTGGACTCT 

GCAGCGGTGTATGTTAATGCCTCAACTCGTTTCACAGATGGAGGACAATTTGGTCTTGGT 

TGTGAAATGGGGATTTCTACTCAGAAATTGCACGCGCGTGGTCCAATGGGCTTGAAAGAG 

TTGACCAGCTACAAGTATGTGGTTGCTGGTGATGGGCAGATAAGGGAGTAAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 1030 1251 F 74 aa 

>[SEQ ID NO: 171] 3864956-2 ORF translation from 1030-1251, direction F 
VTENAEAAAYFTDQVDSAAVYVNASTRFTDGGQFGLGCEMGISTQKLHARGPMGLKELTS 
YKYWAGDGQ I RE * 

Description: 
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gamma-glutamyl phosphate reductase (proA) homolog - Haemophilus 
inf luenz ae (str ain Rd KW2 0) 

Assembly ID: 3864958 
Assembly Length: 17 8 5bp 

>[SEQ ID NO:74] 3864958 Strep Assembly — Assembly id#3864958 

CTGCCCTAGCAGGAACGCAAGAAGGAACTGGAGAATAGGCATTTTCAAAATTATAACCTA 

CACTAGCCATCATATCTAATGTTGGAGTGCTAACTAGCTTATCCTTACTATTCAAGGATA 

AGGCGTCTGCTCTCATTTGATCTACAACAATCAAAATAATATTTGGTTGTTTTGTCTGAA 

CCATAAAATCTCCTTTCTAATATGGCAAAAGAGGCACAAGAAGATATCTACCTTTACTGC 

ACCCCTTTCTATATCAATCTCTCTATATAAAGCAATAACATTCTTGTTATGTTTTATAGA 

ACAATGGACTAAAATATGACTAAATCGATTAGGAAATTCAAATCATTTTCTAGTACTGTT 

TTAGTAAGTTACAGTGTACTATTCCAACTTCAATAAATTATAAACCTTTGTCTAATAACA 

ATTTTAGTGGAGATAAGAAATCCTACACCTAACTCATCTTACACGTAATCTATTTCTATT 

TTATCACAAAAAACGCAAGTAAGACCATTAACTCAATTCAGTTTTATCTGCCATTTTCAC 

AAATGGGAAATAAGTCAAGACACTAATAATCAAACAAACAACTGATAAGATGATGGCACG 

CCAATCAAATGCTGTAGAGAAGAAACCATATAAAATTGGAGGCATTACCCAAGTAACATT 

TTGTGTAACAGGTGAAACAAGACCCCAGCTTGTTGCCCAGTAAGCTACCGTTGCCATGAA 

AACCGGGCTAAGTACAAATGGTATAAATAGCAAAGGATTCAAGACAACTGGTAAACCATA 

ATTCGATACCGGCTCACCAATATTAAACAGAACTGGTGCTAGACCAAGTTTAGCAACTTT 

TCGATAATGACTGTTTCTTGAAAAAATTAAAATAGCAAGTACTAATCCTAATCCTCCAAA 

CCAGACAAACGCCCCAAAAGACCCACTTGTCCATATATAAGGAATCGGTTCACCTTTTTG 

GAAAGCATCCAGATTCGCTAACATAGCAACTCCAAATAGCCCTTCCATGATGGGAGCCAA 

TACATTTCCTCCATGGAGACCAAAAAACCAGAATAACTTATTCAAAAAGATCATCAGAAT 

AACTGCAAAGAAACTTTGAGACAAACCTAGTAATGGCGTTTGTAACACCTTGTAAACCCA 

ATCAATCAATAAGTCATTGCTAAGTAAATGGAAAACATAAGTCAAGATGGCTACTATATA 

CATCGCCATAAATCCTGGAATGATAGAAGTGAACGGCTTAGCAATCGCAGGGGGT^ACTGA 

ATCTGGTAACTTGATTACCCAGTTCTTTTTCATTACTTTACAGAAAATAATAGAGGCTAA 

AAATCCAATCATCATGGCTGTAAAGTAGCCTCTGGCATTAATATGGTTTCCTGGAATCAC 

ATTCCCAATAGTTACCATCAGATTTTTACCATCAAATGCTAGATTATCAATTCCATGTTA 

AGATTTGATCTAATTTCACATCTCCTACATTTGCCAAAGGGAAACTCTTTGTAACTGTAC 

TTCCAATCGAAATGACAAACGAAGCAAGTGATACCAAACCAGCAGAAACTGTATCAACCT 

TGTAAATCTTAGCGATATTCACTCCCAAGCAATAGATGAACAACAAGGAAACAATTGGTA 

TACTTCCCTTGAATACCAAATTATTGATGTCAACAAGCCACTGAAAGGTTTTCGTAATAC 

TTCCTAGGTGAAATTGTTGTGGTAAATCCACTAGAAAAGCATTTAATAACAAAGCAATGG 

AACCTGTCATAATAACAGGCATAGTCCCCACAAATGAATCACGTT 

ORF Predictions: 

ORF # Start End Direction Length 



1 1427 1711 R 95 aa 
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>[SEQ ID NO: 172] 3864958-2 ORF translation from 1427-1711, direction R 

VDLPQQFHLGSITKTFQWLVDINNLVFKGSIPIVSLLFIYCLGVNIAKIYKVDTVSAGLV 

SLASFVI S I G STVTKS F PLANVGDVKLDQ I LTWN * 

Description : 
unknown 

Assembly ID: 3865022 
Assembly Length: 13 86bp 

>[SEQ ID NO: 75] 3865022 Strep Assembly -- Assembly id#3865022 

ATCGAATTTCATTTCTATTTCCTATTCCATTTTTATTCAAAAAATCAAAAAGCAAACTAG 

AAAGCTGGTCGCTGGTGGTTCAAAACACTGTTTTGAGATTGTCAATAGAACTGACAAACC 

CTGTAATATACCTGCATATATACATACGACAAGGCGATACTACCCTAGTTTGAAGAGATT 

TTCGAAGAGTATTCATTTTTGTCTTTTACTTATTATACCATATTCACATAAAAAAACGAA 

CATTCTTATCCTAAAAAATGCTCATTTTTCTTAAATTATCAATCTAAATCTGGTTTATAG 

AAGGAACGATTATCCATAGCGAAGATTTTATTGGTCATCTCTCCTTTATCCACCAAAGCC 

AGAGCTGTTGACATCATCATCATGCTTGCATCCAGATTGTCAATCATATGGATAATCTCT 

GCCTCCATAATACGTGGACGGACTGGAATTTCCATATTCAAGCAAGCCGTGGTGGACTTG 

AGGATGACATGACGAAGCAAAACGACTTCTTCCTTGGTATCATCGATGCCGAGTTCCATA 

ACTGTCTTGGTAATTTCGCTATCAATGAGAGCGATATGTCCAAGAAGATTACCTCGCACT 

GTGTACTCTGTCTGGTCTGGCCCCGTCAACTCGATAACCTTAGCTAAGTCATGCAGCATA 

ATCCCCGCATAGAGCAGGCTCTTATTGAGCTGAGGATAAACTTCGCTAATAGCGTCTGCC 

AAACGTACCATGGTCGCCGTATGATAAGCCAACCCCGTTTCAAAGGCATGGTGGTTGGTC 

TTGGCGGCTGGATAGGAGTAGAATTCCTTATCATACTTGGTGTAGAGATTTCGGACAATC 

CGTTGCCAGACAGGATTTTCAATTTTGAAAATCATTTGCGACATGTAGTCACGAATTTCC 

TTGACATCAACTGGTGACTTGACCTTGAAATCAGCTGGGTCATTGGGTTCACCAGCTTGA 

GGCAGGCGGAGAGTAATTTGATTGACTTGAGGGGTATTGTTATAAACTTCTCGGCGTCCT 

TTCATGTGGACAACCTTACCTGCGGTAAAGGCCTCAATGTTATGAGGTTGGGCATCCCAG 

AGCTTCCCATCAATCTCGCCACTATCATCTTGGAAGGTAAAGGCTAGGTAGTTTTTCCCA 

GCTCGAGTTTGCCTCAGGTCAGCTGATTTGATTAGGTAAAAGCCTTCAAATAACTCATCT 

TTTTTCATGTGACTAATCTTCATATTCTTCCTCATTTTCTTGAAAATGGAGTAGATCAAG 

CGCAGGCTCACCTTCTGACAACTCAATGTGACGGAGCGTCCGCTCGATAGCTATGGTACG 

ACGGTTTAATAATTCGATCAATATTGCCAGAGGCATGTTGGAGATGTTTTTGTGCCTTGA 

CCAGAA 

ORF Predictions: 

ORF # Start End Direction Length 



1 279 1271 R 331 aa 

>[SEQ ID NO:173] 3865022-1 ORF translation from 279-1271, direction R 
VSLRLIYSIFKKMRKNl^ISHMKKDELFEGFYLIKSADLRQTRAGKNYLAFTFQDDSGEI 
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DGKLWDAQPHNI EAFTAGKWHMKGRRE VYININT PQ VNQ I TLRLPQAGEPNDPADFKVKSP 
VDVKEIRDYMSQMIFKIENPVWQRIVRNLYTKYDKEFYSYPAAKTNHHAFETGLAYHTAT 
MVTlLADAISEWPQLISrKSLLYAGIMLHDLAKVIELTGPDQTEYTVRGNLLGHIALIDSEI 
TKTVMELG I DDTKEEWLLRHVT LKSTTACLNME I PVR PR I ME AE 1 1 HMI DNLDASMMMM 
STALALVDKGEMTNKIFAMDNRSFYKPDLD* 

Description : 

gi 1 710422 (U2163 6) cmp -binding- factor 1 [Staphylococcus aureus] 

Assembly ID: 3865036 
Assembly Length: 1167bp 

>[SEQ ID NO:76] 3865036 Strep Assembly — Assembly id#3865036 

CTCAGATTACAGAGGACAATCAACTGGTTCATTTTCGTTTCCAGTTTCAAAAAGGCTTAG 

AAAGGGAGTTCATCTATCGTGTGGAAAAAGAAAAAAGTTAAGGCAGGTGTTCTCCTCTAC 

GCAGTCACCATAGCAGCCATCTTTAGTCTTTTGTTGCAATTTTATTTGAACCGACAAGTC 

GCCCACTATCAAGACTATGCTTTGAATAAAGAAAAATTGGTTGCTTTTGCTATGGCTAAA 

CGAACCAAAGATAAGGTTGAGCAAGAAAGTGGGGAACAGGTTTTTAATCTAGGTCAGGTA 

AGCTATCAAAACAAGAAAACTGGCTTAGTGACGAGGGTTCGTACGGATAAGAGCCAATAT 

GAGTTTCTGTTTCCTTCAGTCAAAATCAAAGAAGAGAAAAGAGATAAAAAGGAAGAGGTA 

GCGACCGATTCAAGCGAAAAAGTGGAGAAGAAAAAATCAGAAGAGAAGCCTGAAAAGAAA 

GAGAATTCCTAGTCAATTCAACTATAATGCGTTGAATCCAGAATAGTCCACTGTAGTTTC 

TAGAAAATTGCTGGAAATGGATGTTAAGCTCCAATTCATTTGTTTATATCTTATTTCAGT 

CCACTATACTTTGTGCTAAATTAAAGATATGAAACATGATTTTAACCACAAAGCAGAAAC 

TTTCGATTTCCCTAAAAATATCTTCCTCGCAAACTTGGTATGTCAAGCAGCCGAGAAACA 

GATTGATCTTCTATCAGACAAAGAAATTTTAGATTTCGGTGGTGGCACGGGTCTATTAGC 

CTTGCCCCTAACCCCTAGCCAAGCAGGCTAAGTCAGTCACTCTTGTAGACATTTCTGAGA 

AAATGTTGGAGCAAGCTCGTTTGAAAGTGGAGCAGCAAGCAATCAAGAATATCCAGTTTT 

TGGAGCAAGATTTACCGAAAAATCCCTTGGAGAAAGAGTTTGATTGCCTTGCTGTTAGTC 

GGGTTCTTCATCATATGCCTGATTTGGATGCGGCTCTCTCACTGTTTCATCAACATTTGA 

AGGAAGATGGGAAACTCATCATTGCTGATTTTACCAAGACAGAAGCTAATCATCATGGAT 

TTGATTTAGCTGAACTGGAAAACAAGCTAATTGAGCATGGGTTTTTCATCTGTGCATAGT 

CAGATNCTCTATAGCGCTGAAGANCTG 

ORF Predictions: 

ORF # Start End Direction Length 



1 79 492 F 138 aa 

>[SEQ ID NO: 174] 3865036-1 ORF translation from 79-492, direction F 
VWKKKKVKAGVLLYAVTIAAIFSLLLQFYLNRQVAHYQDYALNKEKLVAFAMAKRTKDKV 
EQESGEQVFNLGQVSYQNKKTGLVTRVRTDKSQYEFLFPSVKIKEEKRDKKEEVATDSSE 
KVEKKKSEEKPEKKENS * 
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Description: 
unknown 

Assembly ID: 3865054 
Assembly Length: 916bp 

>[SEQ ID NO: 77] 3865054 Strep Assembly -- Assembly id#3865054 

TCTCCCAACATATAATTTCCGTTTTCCAATCCCCCAGCTGTCATACAGTCTGTGATAAGA 

GCGATGTTTTCTGTTCCTTTTTGTTTGATAAGAATTTCGCAAGCCTTTGGATCTACGTGG 

TGACCATCACAGATCAACTCTGCATAGGTATGTGGCAATTGGTACATGGCTCCAACCATA 

CCCAATTCACGGTGAGTCAACCCACGCATTCCATTGTAGGCATGCACCCAAACACTCGCT 

CCAGCATCGACTGCTTTTTTGGCTTCATCAAAAGTCGCGTTTGAATGTCCAAGAGCAACC 

GTCACACCTTCGCCCGTAACTGTACGAACAAAGTCTTCCACCCCATCACGTTCTGGTGCA 

ATCGAATTTTATTAAGCAAGCCATTTGCCGCTTTTTGCCAAGAATGAAACTCCTCAACAC 

CCGGGTCTCTCATATAAGTTGGATTTTGTGCCCCCTTAAAAGTTTCTGTGAAATATGGAC 

CTTCATAATAAATCCCACGAATCTTAGCACCTGTTGCTTCTTTATAATGGTTTCCAAGAT 

TTTCAGTGACTGCAAGCAATTGCTCATAAGTGGCTGTTAAAGTTGTGGGTAAGAAACTGG 

TAACACCGGTACTAAGAAGTCCTTCACTCATAGTATGCAATGTACCTTCAATGTTGTTGT 

CCATCACATCTACACCTGCATATCCATGAATATGAGTATCCACAAGACCTGGGGCAATGC 

TATAACCTGTATAGTCAATCACCTCAGCCCCTTCAGGAATCTGCTCTACATGTTTCCCAA 

ACTTGCCGTCCACAAGTTCCAAGTAACCACCTCGACAAATCCGTGTGGGTAGAAAAACTG 

ATCCGCTTTAATATAGTTAGGCATAATGTTAACCTCCTTAAAAGATTGATTCTACAATTT 

ATTATGTCAATTCGAT 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 302 793 R 164 aa 

>[SEQ ID NO: 175] 3865054-1 ORF translation from 302-793, direction R 
VDGKFGKHVEQIPEGAEVIDYTGYSIAPGLVDTHIHGYAGVDVMDNNIEGTLHTMSEGLL 
STGVTSFLPTTLTATYEQLLAVTENLGNHYKEATGAKIRGIYYEGPYFTETFKGAQNPTY 
MRDPGVEEFHSWQKAANGLLNKIRLHQNVMGWKTLFVQLRAKV* 

Description: 

N-acetylglucosamine-6-phosphate deacetylase (nagA) homolog - 
Haemophilus influe nzae (strain Rd KW2 0) 

Assembly ID: 3865102 
Assembly Length: 7 8 6bp 

>[SEQ ID NO:78] 3865102 Strep Assembly — Assembly id#3865102 
CTGGATTAAAACGAGGCAGTTTCAGACTAATATCCAAGTCGTAAGAAATGCCTGAAATAA 
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GCTTTTCTAAATTGTCCAAAGCTTGCGGGAAAACGCTCTTGGAATAGTTTCTCTAAAGAA 
CTTGCTGATATAAAGACATCTTGTCTCGAACGCAAGGGAACTTCTCTGAGCGGTAGATTT 
TCTTTAATCGCTGTTAAAACTTGAAGAACTTCTCTATCCCTGCTTTCAAAAGCGTTGACC 
CGATAAAGAGGTAAGATAGGATGATGAAATTCGCTTGCTAGTGTTTCTGGATAAACCCCT 
ATATAGTAATCACAGCCTAGTTCTAACGACTCAACTCTATCAAAATAAGGCACAATGACC 
GCGATATCCTCCAGGTACTGGGACAGGACTGACCAAGTTTTCTCCCCCTGCATCTTGGCT 
GTCGAAAGCTTCATCAACTGCTGATAGCCCACACTAGATAGAGCTAAAAAGCGCAAATTC 
ACTTCCTGATCATCTACAAACACTGTCATTTCAAGCCCTAGCAAAGGATGAATGCCGTAT 
TTTTTTGTAATCTCTAGAAAGTCGAAAGCGCCATAAAGATTGTCAATATCCATCATAGCC 
AAATGAGTGTAGCCGTATTCTTTAGCTGCTCTCACATACTTTTCGATCGAAATGACGCTT 
TCCATAAAACTATAGACTGTTTTTGTATCTAGTTGTGCGATCAATTTACACTTCTCCTCT 
ATCCTTCTCACTATATTATACCATTTTCACCTATAAATGGCTTCTCTTGAGAAAAATTTC 
GATCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 27 731 R 235 aa 

>[SEQ ID NO: 176] 3865102-1 ORF translation from 27-231, direction R 

VRRIEEKCKLIAQLDTKTVYSFMESVISIEKYVRAAKEYGYTHLAMMDIDNLYGAFDFLE 

ITKKYGIHPLLGLEMTVFVDDQEVNLRFLALSSVGYQQLMKLSTAKMQGEKTWSVLSQYL 

EDIAVIVPYFDRVESLELGCDYYIGVYPETLASEFHHPILPLYRVNAFESRDREVLQVLT 

AIKENLPLREVPLRSRQDVFISASSLEKLFQERFPASFGQFRKAYFRHFLRLGY* 

Description: 
unknown 

Assembly ID: 3865156 
Assembly Length: 1213bp 

>[SEQ ID NO: 79] 3865156 Strep Assembly -- Assembly id#3865156 
CACTTTCAGCTTCTTCTCTTTTTGAACGGTTATAAACACGAATCAGATTCCCTATTTCTT 
GCGATTTATGTGATTCCTTATTTTCCAATCTAAAGTATAGTGAAATGAAATAAAACATGC 
GCAAATCGATTAAGGAATTTAATCTAATTTCTAACAATGTCTTAGAAATCAAAGTGTACT 
ATTTTAACTTCAATGCACTAAACATCTAATACTCAATAAAAATCAAAGAGCAAACTAGGA 
AACTAGCCGCAGGTGGCTCAAAACACTGTTTTGAGGTTGTAGATGAAACTGACGAAGTCA 
GTAACCATACATACGGCAAGGCGACGCTGACGTGGTTTGAAGAGATTTTCGAAGAGTAGC 
AAAATGGAAAAAGGAGTGAGTGAAGCACATCGCCTCCCCACTCCTTTTTCTGTTTTTAGG 
CTGTTTTTTCAACCTTCAAGATTTTTACATCATAGCTACCAACAGGCGTTTCAATGGTTG 
CTGTATCACCTGTTTTCTTGCCAATCAAGGCCTGCCCAATTGGGCTTTCATTTGAAACCT 
TACCTGCAAAGGCATCCGCACCAGCTGAACCTACGATAATATAAACTTCTTCTTCGTCCT 
CACCAATTTCTTGGATGGTGACTGTTTTACCAATCGCTACTTCGTCCTGGGCAACTGCGT 
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CGCTATTGACGATTTCAGCATAGCGGATTTTTGTTTCTAAGCTAGAGATTTGTCCTTCGA 
CAAAGGCTTGTTCATCCTTAGCTGCTTCGTACTCACTGTTTTCTGAAAGGTCACCGTATG 
AACGGGCAATCTTAATGCGTTCTACCACTTCTGGTCGACGAAACCAATTTCAATTCTTCT 
AATTCTTTTTCAAGTTTTTCCTTTTCCTCAAGGGTCATAGGATATGTTTTTTCTGCCATT 
TTTCTCAACTTTCTTCTGATAATATTTTCTAAAGAAAATTATGTGAAGTATCACATAATT 
TTAGTTTGTTTAGTTTAATTTGCTGTTGACATGTTCAGCGACATTGCGGTCGTGGTCTTC 
TTGATTGTTAGCATAGTAAACCTTGCCTTCTGTGACATCTGCTACAAAGTAAAAGTTATC 
GCTCTTAGTTTGATTGATGCTTGACTCAATCCGCATCCAAGACTTGGACTATCGACTGGA 
CCAGGCATGAGACCTACATTTTTATAAACATTATAAGGTGAATCAATGTTGGTATCAATC 
GCAACATCCTCAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 416 808 R 131 aa 

>[SEQ ID NO: 177] 3865156-1 ORF translation from 416-808, direction R 
WERIKIARSYGDLSENSEYEAAKDEQAFVEGQISSLETKIRYAEIVNSDAVAQDEVAIG 
KTVT I Q E I GEDEEEVY 1 1 VGS AGADAFAGKVSNE S P I GQ AL I GKKTGDTAT I ET PVG S YD 
VKILKVEKTA* 

Description : 

TRANSCRIPTION ELONGATION FACTOR GREA (TRANSCRIPT CLEAVAGE FACTOR 
GREA) . - ESCHE RICHIA COLI . 

Assembly ID: 3865160 
Assembly Length: 1173bp 

>[SEQ ID NO: 80] 3865160 Strep Assembly -- Assembly id#3865160 

TGCGGCTGAGTTGGGAATTCCTATCGTTAATAAGCGTGTATCGGTGACACCTATTTCTCT 

GATTGGGGCAGCGACAGATGCGACGGACTACTGGTTCTGGCAAAAGCGCTTGATAAGGCT 

GCGAAAGAGATTGGTGTGGACTTTATTGGTGGTCTTTCTGCCTTAGAACAAAAAGGTTAT 

CAAAAGGGAGATGAGATTCTCATCAATTCCATTCCTCGCGCTTTGACTGAGACGGATAAG 

GTCTGCTCGTCAGTCAATATCGGCTCAACCAAGTCTGGTATTAATATGACGGCTGTGGCA 

GATATGGGACGAATTTATCAAGGAAACGGCAAATCTTTCAGATATGGGAGCGGCCAAGTT 

GGTTGTATTCGCTAATGCTGTTGAGGACAATCCATTTATGGCGGGTGCCTTTCATGGTGT 

TGGGGAAGCAGATGTTATCATCAATGTCGGAGTTTCTGGTCCTGGTGTGGTGAAACGTGC 

TTTGGAAAAAGTTCGTGGACAGAGCTTTGATGTTAGTAACCCGAAAACCAGTTAAGAAAA 

CTGCCTTTTAAAATCACTCCGTATCCGGTCCAATTGGTTTGGTCAAATGCCCAGTGAGAG 

ACTGGGTGTGGAGTTTGGTATTGTGGACTTGAGTTTGGCACCAACCCCTGCGGTTGGAGA 

CTCTGTGGCACGTGTCCTTGAGGAAATGGGGCTAGAAACAGTTGGCACGCATGGAACGAC 

AGCTGCCTTGGCCCTCTTGAACGACCAAGTTAAAAAGGGTGGAGTGATGGCCTGTAACCA 

GGTCGGTGGTCTATCTGGTGCCTTTATCCCTGTTTCTGAGGATGAAGGAATGATTGCTGC 
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AGTGCAAAATGGCTCTCTTAATTTAGAAAAACTAGAAGCTATGACGGCTATCTGTTCTTG 
TTGGATTGGATATGATTGCCATCCCAGAAGATACGCCTGCTGAAACTATTGCGGCTATGA 
TTGCGGATGAAGCAGCAATCGGTGTTATCAACATGAAAACAACAGCTGTTCGTATCATTC 
CCAAAGGAAGAGAAGGCGATATGATTGAGTTTGGTGGTCTATTAGGAACTGCACCCGTTA 
TGAAGGTTAATGGGGCTTCGTCTGTCGACTTCATCTCTCGCGGTGGACAAATCCCAGCAC 
CAATTCATAGTTTTAAAAATTAAGAAAATAGGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 136 375 F 80 aa 

> f SEQ ID NO: 178] 3865160-1 ORF translation from 136-375, direction F 
VDF I GGL S ALEQKGYQKGDE I L INS I PRALTE TDKVC S S VN I G STKS G INMTAVADMGR I 
YQGNGKSFRYGSGQVGCIR* 

Description : 
unknown 

Assembly ID: 3865172 
Assembly Length: 12 09bp 

>[SEQ ID NO: 81] 3865172 Strep Assembly — Assembly id#3865172 

TCGGAATCTGAGCTAGTGTAGCTTCCTTAATCTTATCTGATAAGATAGCTGTCATATCAG 

ACTCAATCATTTCCTGGAGCAATCAACATTGACTCGTATATTCCGACTAGCGACCTCGCG 

TGCCACAGACTTGGTAAAGCCAATCAAGCCAGCCTTAGAAGCAGCATAGTTAGCTTGACC 

AATATTCCCCATCAAACCAACAACACTAGACATATTAATGATAGCACCTTCTCTGGCTTT 

CATCATCGGTTTCAAGACTGATTGTGTCATATTAAAGGCACCAGTCAGATTGACCTTGAG 

CACTTTTTCAAAATCTGCTTCTGTCATCTTGAGCATAAGAGTATCTTGGGTAATCCCTGC 

ATTGTTGACCAAAACATCTACTGAACCCAGTTCTGCAATAGCTTGATCAATCATACGCTT 

AGCGTCTGCAAAATCTGATACATCTCCTGAAATGGGAACCACCTTGATACCATAGTTTGA 

AAACTCAGCGAGCAATTCTTCTGAGATTGCCCCACGACTGTTTAAGACAATGTTGGCTCC 

TGCTTGAGCAAACTTGTGGGCGATGGCAAGACCAATTCCACGACTCGAACCTGTAATAAA 

GATATTTTTATGTTCTAGTTTCATTTTTTTCCTTTCAAAACTTCTACTTATTTTAGTCTA 

TTTTTCTAAAAGTGCTACTAAACTCGCTTGATCTTCCACATGAGCTAAGTGAGCAGTTTG 

ATCAATTTTTTTAACAAAACCTGACAAGACTTTCCCCGGTCCAATCTCGAATAAAGTTGC 

TTATGCCTGCTTCTTGCATGACCCCAATACTTTCATAGAAACGAACGGGTTCCTTGACCT 

GACGCGTCAAGAGCTGAGCAATGTCCTCTTTTTGCATCACAGCAGCTTCTGTATTGCCGA 

CTAGGGGACAAGTAAAATCTGAAAAACTTACCTGAGCTAGAGTTTCAGCTAGTTTCTGGC 

TAGCAGGCTCAAGGAGAGCGGTGTGAAAGGGACCTGACACCTTAAGAGGAATCAAGCGTT 

TGGCACCTGCTTCTTGCAAAAGTTCAACCGCTCGATCAACTGCAACCACTTCTCCAGCAA 

TGACGATTTGTGCAGGTGTGTTATAGTTGGCTGGAGTAACCACTCCAAGTTCCAGAAGCT 

TTTTGACAGGCTTCTTCAATGACCTCTACTGGCGTATTGAGAACTGCTACCATCTTGCCA 
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AGTTCAGCA 

ORF Predictions : 

ORF # Start End Direction Length 



1 731 1123 R 131 aa 

>[SEQ ID NO: 179] 3865172-2 ORF translation from 731-1123, direction R 
VVTPAlSrmTPAQIVIAGEWAVDRAVELLQEAGAKRLIPLKVSGPFHTALLEPASQKLAE 
TLAQVS F SDFTC PLVGNTEAAVMQKEDI AQL LTRQ VKEPVRF YE S I GVMQ EAG I SNF I RD 
WTGESLVRFC* 

Description : 

malonyl coenzyme A-acyl carrier protein transacylase (fabD) homolog - 
Haemophil us influenzae (strain Rd KW2 0) 

Assembly ID: 3865228 
Assembly Length: 813bp 

>[SEQ ID NO: 82] 3865228 Strep Assembly — Assembly id#3865228 

ATGACACGTCTGTTCTCTCAAGCAGAAATGGCAGAGTAACAAGCTCGATATTGAGGTAGC 

CGATAAAGAATTGGCTGAATTTGAAGCTCAGATTAAACAGGAAGTGGAAGCTCCAACTTG 

TAGTGAGTCCTCAGGTTGAAGAAGAGCCTCAGCTCATCCAGTTGGCCCAATGTATGAAGA 

ACCAGAAGTAAATCCAGTGCATCCGACAGGTCCAACACCAGCTACAGAAACTGTTGATTC 

AATACCGGGATTTGAAGCACCGCAAGAATCTGTTACAATTTTATAAGAAATATTCTGAGA 

ACAATATCTTATCCTTATATTTCCAGCGAGCAGGAAATGGTGTGAGTCCTGCATTCCCTA 

TCGATAAGATTATCCTCTCAAACTATCAAGTCTGAATCTAGTAAGATTTGACGTTCCCCA 

CGTTACGGGATAAGAGAGAGAAAGACTAAATCTTTTTCCGAATAAAGGTGGTACCACGAT 

TTTCGTCCTTTTTGGAAGTCGTGGTTTTTAATTTGTTATTATTTATAAAGGAGATACCAT 

GAAACTCAAAGACACCCTTAATCTTGGGAAAACTGAATTCCCAATGCGTGCAGGCCTTCC 

TACCAAAGAGCCAGTTTGGCAAAAGGAATGGGAAGATGCAAAACTTTATCAACGTCGTCA 

AGAATTGAACCAAGGAAAACCTCATTTCACCTTGCATGATGGCCCTCCATACGCTAACGG 

AAATATCCACGTTGGACATGCTATGAACAAGATTTCAAAAGATATCATTGTTCGTTCTAA 

GTCTATGTCAGGATTTTACGCGCCATTTATTCC 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 197 286 F 30 aa 

>[SEQ ID NO:180] 3865228-1 ORF translation from 197-286, direction F 
VHPTGPTPATETVDSIPGFEAPQESVTIL* 
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Description : 
unknown 

Assembly ID: 3865230 
Assembly Length: 953bp 

>[SEQ ID NO: 83] 3865230 Strep Assembly — Assembly id#3865230 

ATCGAATTATTTTGAAACAAGGTGGATCAGCTATTTTGGCCTTGATTAGTATTTTACTCT 

TTAAATACACTTGAAGGTCGATTCTAATCTCGCTAATCCTTTTTAATCCAGAATAAGGGA 

AATATGTTATACTTGTTTTTAAGAAAAAAGTTTCATTGAATTGGTTTTGAGGAGTTAGAA 

ATGAAAGTATTAGTGACAGGTTTTGAGCCCTTTTGAGGCCATTAAAGGTTTACCAGCTGA 

AATCCATGGTGCTGAGGTCCGTTGGCTAGAGGTGCCGACAGTTTTTCACAAATCTGCTCA 

AGTATTGGAAGAAGAGATGAATCGTTATCAACCTGACTTTGTCCTTTGTATTGGGCAAGC 

TGGTGGAAGAACTAGTTTGACACCTGAACGAGTGGCCATTAATCAAGACGATGCACGTAC 

TTCTGATAACGAAGAT7VATCAACCGATTGACCGTCCCATTCGCCCAGATGGTGCTTCGGC 

CTACTTTAGTAGTTTGCCGATTAAAGCGATGGTTCAAGCTATAAAAAAGAAGGATTACCG 

GCCTCTGTTTCCAATACGGCAGGGACTTTTGTCTGCAGCCATTTGATGTATCAGGCTCTC 

TATTTGGTAGAAAAGAAATTCCCATATGTTAAGGCAGGTTTTATGCATATTCCTTATATG 

ATGGAACAGGTGGTGAACAGACCGACTACTCCAACTATGAGTTTAGTGGATATTCGGCGA 

GGGATAGAAGCAGCAATCGGCGCTATGATAGAACATGGAGATCAGGAACTCAAGTTGGTA 

GGCGGAGAAATTCATTGATAGAAAAAAGCTTGAGGGGAAAACCTTCAAGCTTTTGGACGT 

TTTCGAGCCAATACTGCTCGGTAAAACATAATTTTAGTGCATTGGATATAAGGTAGGAGT 

GAAAAACTAGCAATGCCAAAGGTAATCCAATTGAGGAAGTACCAAGGAAGAAG 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 272 586 F 105 aa 

>[SEQ ID NO: 181] 3865230-1 ORF translation from 272-586, direction F 

VPTVFHKSAQVLEEEMNRYQPDFVLCIGQAGGRTSLTPERVAINQDDARTSDNEDNQPID 

RPIRPDGASAYFSSLPIKAMVQAIKKKDYRPLFPIRQGLLSAAI* 

Description : 

PYRROL I DONE -CARBOXYLATE PEPTIDASE (EC 3.4.19.3) ( 5 -OXO PROLYL- 
PEPTIDASE ) . - STR EPTOCOCCUS PYOGENES. 

Assembly ID: 3865378 
Assembly Length: 10 60bp 

>[SEQ ID NO: 84] 3865378 Strep Assembly — Assembly id#3865378 

CTACTTGAAACAGAACTGAAATTATACCCACTACCTCCCTGATTATCTTCAATGCTTACG 

TCTAAATAAACTTCCCCACTATTATTTAGCTTAGCAACAACTGTTATAGTAAAATAACAT 
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AAAATTCACATAAATAGATTAGGGAAATCAAAGCAACTTCTAGGAATGTTTTAGCAGTCA 
CAGTGTACTTTCCCAGCATCAAGCCACTATAACTCTGCACATAAAAATGGAGAAGATGGC 
CATCCTCTTCTCCAAATATTAACTTCTTTACAAACCAACTATAGTTGACAAAGAACCTAA 
AATCAATTGATAACACGAGGTCAGGTCGGTCAACTCTTTCAACTGAAGCCCTGTCAACTC 
TTCCCATTTATCAATCTTGTATTGGAGAGAATTGCGGTGCAGATAGAGTTGCTGGGCTGT 
TTAAGTGAGAACAGCACTATTTTCCCAAAGAGAGAGAATGATTTCCTGAATCTGATCTTG 
ATCCAAAATCATCTGGTGTAGACATTCCTTGATTGGCTTCAAGTCCACGAGTCTTTCTCC 
CAGACTCCAAAGATAGAGCTGAGAAAAAGTATGAACACCTTGGTGACCCTGACGCCACCA 
TGTCTTGAACAAATCCCGCTCAGCTTTGATTAAGTCTGATAGGGCTTGATGTCCCGTCTG 
AGACCAAACCTGACCCAACATGATAGAAAGACGAAGTCCAAAGTCATACTCAACCGCTTC 
AATCGTATCACTTAAAATATCTCTTACAGAAGTGTATTTGTCTTGTTGAAGCACGAAAAC 
ATAATCCTGAGATCCGACCTGTAGCACTGTCTGACAATTCGGAAAAAGAGTCCGCATCAT 
ATCTAGCCAAGAAGCCAGATTTTCCTGCTGAAAATAAGAAAGATGGCAATAAACCAACTG 
AATCTTTTTAAAAACTTGCGGTGCCTGTCCCTTGCCTTCAACCAGATAGGAATACCAAGG 
GTTTAGCGAACGAACCTGCTCCTGCTGGGTCAAAAGGGCAACCAACTGCTTTTCACGCTC 
GCTGAGCCCAGCTTCCTCCAGCAAAATCCACTGCTGAGAG 

ORF Predictions: 

ORF # Start End Direction Length 



1 421 807 R 129 aa 

>[SEQ ID NO: 182] 3865378-1 ORF translation from 421-807, direction R 
VLQVGS QDYVF VLQQDKYT S VRDI L S DT I EAVE YDFGLRL S IMLGQ VWS QTGHQ AL S DL I 
KAERDLFKTWWRQGHQGVHTF SQL YLWSLGERLVDLKP IKECLHQMI LDQDQ I QE 1 1 LSL 
WENS AVLT * 

Description: 
unknown 

Assembly ID: 3865470 
Assembly Length: 895bp 

>[SEQ ID NO: 85] 3865470 Strep Assembly Assembly id#3865470 

ATTTTAGACTTTGATGACAATCCTCAGGCGGTTATCATGCCCAATCACGAGGGGCTGGAA 

TTGCAGTTGCCAAAGAAGTGTGTTTATGCATTTTTAGGTGAGGAGATCTGACCGCTATGC 

AAGGGAAGTAGGGGCGGATTGTGTCGGCGAATTCGTTTCTGCTACCAAGACCTATCCAGT 

CTCTTTCATCAACTACAAGGGTGAGGAGGTCTGTCTGGATCAGGCTCCTGCTGGCTCCGC 

TCCAGCAGCCCAGTTTATGGATGGGTTGATTGGCTATGGTGTGGAGCAGCTTATCTCTAC 

TGGGACCTGTGGTGTCCTAGCTGATATAGAGGAAAATGCCTTTCTAGTCCCTGTTCGCGC 

TTTGCGAGATGAGGGAGCCAGTTACCACTATGTGGCACCTTGTCGTTATATGGAAATGCA 

GCCAGAGGCTATTGCTGCTATTGAGGAAGTTTTGGAAGACAGAGGGATTCCTTATGAAGA 

AGTCATGACCTGGACGACAGACGGTTTTTACCGAGAAACGGCTGAAAAGGTGGCTTATCG 
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TAAGGAAGAAGGCTGTGCTGTTGTGGAGATGGAGTGTTCTGCTCTTGCGGCAGTAGCTCA 
ATTGCGTGGGGTTCTCTGGGGTGAATTGTTGTTCACAGCAAATTCTCTAGCGGACTTGGA 
CCAGTACAACAGTCGTGACTGGGGCTCGGAACCTTTTAATAAGGCGCTAAAACTGAGTTT 
AGCAAGTGTCCACCACCTTTAGTTGTACTGGCAAAGGATTTGTTTTATCATAAAATGTCT 
AGCTCATACTTTTCAAAAATATGTTTAAACGAAGTCACCTTCCTCTTGTCCTAAGCATGT 
TTGAAGTTGGGAAAAATCTTTAAAATCAGAAAAACGTATCATATCAGGTTGATGA 

ORF Predictions: 

ORF # Start End Direction Length 



1 98 742 F 215 aa 

>[SEQ ID NO: 183] 3865470-1 ORF translation from 98-742, direction F 

VRRSDRYAREVGADCVGEFVSATKTYPVSFINYKGEEVCLDQAPAGSAPAAQFMDGLIGY 

GVEQLISTGTCGVLADIEENAFLVPVRALRDEGASYHYVAPCRYMEMQPEAIAAIEEVLE 

DRG I P YE E VMTWTTDGF YRET AEK VA YRKEEGC AVVEMEC SALAAVAQLRGVLWGELLFT 

ANSLADLDQYNSRDWGSEPFNKALKLSLASVHHL* 

Description: 
unknown 

Assembly ID: 3865632 
Assembly Length: 645bp 

>[SEQ ID NO: 86] 3865632 Strep Assembly -- Assembly id#3865632 

AGGGCTGTCAAGCTTGGTTAGAACGTTTAGAAAAGGAGAGTTAAGGTGGAAAATCTTACG 

AATTTTTACGAAAAGTATCGTGTCTATCTGACTCGTCCACGTTTAGAGCTTTTGGCAGTA 

GTTACCATTGTTTTANGNGCTGTACTCGTCTTTTTTCTAAATATTCCAGGAAAAGGTGTC 

TTAAAACTCGATAATGGAACGATTGTTTATGATGGCAGTCTTGTCCGTGGTAAAATGAAT 

GGCCAAGGTACCATTACCTTCCAAAATGGAGACCAATATACAGGTGGCTTCAACAATGGA 

GCCTTCAACGGAAAAGGTACCTTTCAATCTAAAGAAGGCTGGACCTACGAAGGTGATTTT 

GTAAATGGTCAGGCTGAAGGAAAAGGGAAACTAACAACAGAAC7UVGAAGTCGTTTATGAA 

GGAACTTTTAAACAAGGCGTTTTTCAACAAAAATAAAGCCTCCTTATCAAAGGAGGTATT 

ATTAGAATTACAAGGTAAGCGTTTACCTGTAAATCCCTTTCTTTCCAAATCCCTCTTCCA 

AGCAAGTTTGTGAAATAAAAAATATTTGAAATAAATTTCACAAACTTCAAAGATAAAACC 

TGATAAGAAAAGAAAATGAGAAAAGTTTCGCAAGAGTTTAAAAAT 

ORF Pr edi c t i ons : 

ORF # Start End Direction Length 



1 46 456 F 137 aa 

>[SEQ ID NO: 184] 3865632-1 ORF translation from 46-456, direction F 
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VENLTNFYEKYRVYLTRPRLELLAWTIVLXAVLVFFLNIPGKGVLKLDNGTIVYDGSLV 
RGKMNGQGTITFQNGDQYTGGFNNGAFNGKGTFQSKEGWTYEGDFVNGQAEGKGKLTTEQ 
EWYEGTFKQGVFQQK * 

Description: 
unknown 

Assembly ID: 3865710 
Assembly Length: 572bp 

>[SEQ ID NO: 87] 3865710 Strep Assembly -- Assembly id#3865710 

GAGATCTGTCTTGACACCAAAAGTGTGGAGTACGCCAGCTAATTCAACGGCGATATAACC 

AGCGCCTAGAATCGCAATTGACTCTGGAAGTTCTTCCCAGGCAAATACATCATCAGAAGA 

GCCACCTAGCTCAGCACCAGGAATATTAGGAATACTTGGATGGGCACCTGTAGCAATCAC 

GATATGTCTAGCACGAATCAGTTCACCATTTACGCTTACAGTATGAGAATCTACAAATTC 

AGCATGACCTTCAATCAAGTCTACACCGTTGCGTTTAAAACTACCATCATAGAGAAGAAC 

GAGCGCGATCAATGTAGGCTTCACGATTGCGACGTAGGGTTGCAAAGTTAAAGTTAAGAT 

CAGTAGTCTCAAAGCCGTAGTCTCCTCCAAATTGATGGAAAGTCTCAGCGATTTGCGCCC 

CGCTACCACATGATTCTTTTAGGAACACAACCGACGTTGACACAGGTTCCACCTAATTTC 

TTTTCCTCAATAACGGCTGCTTTGGCTCCATGTTCCCAGCACGGTTCATGGTAGCGATCC 

TCCGCTACCTCCACGATAGCAATGATATCATA 

ORF Predictions: 

ORF # Start End Direction Length 



1 287 448 R 54 aa 

>[SEQ id NO: 185] 3865710-1 ORF translation from 287-448, direction R 
VFLKESCGSGAQIAETFHQFGGDYGFETTDLNFNFATLRRNREAYIDRARSSL* 

Description: 

glutathione reductase (NADPH) (EC 1.6.4.2) - Streptococcus 
thermophi lus 
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Provided in Table 2 is information on the direction of the ORF (forward or reverse) 
for each polynucleotide in Table 1 . Also listed for each ORF is its start and stop codon 
positions (refer to the columns containing nucleotide code labeled "Start" and "Stop"). The 
triplet codon sequence for each start and stop codon is also shown. These codons may be 
shown in the sense orientation or antisense orientation, such as GTG and CAC, 
respectively, for start codons. The "Length" column discloses the length of each 
polynucleotide assembly. The direction of translation on the polynucleotide depicted is 
denoted by and "Forward" for forward or and "Reverse" for reverse (or being on the 
opposite strand from the one depicted). As indicated above, the "Assembly ID" number is a 
unique identifier assigned to each ORF of Table 1 and allows a correlation between the data 
in Tables 1 and 2. 







TABLE 2 










Assembly 


Start 


Stop 


Start 


Stop 


Length 


Direction 


ID 














3049156 


-CAC 


TCA- 


236 


385 


50 


Reverse 


3049862 


GTG 


TGA 


383 


526 


48 


Forward 


3112810 


-CAC 


TTA- 


601 


804 


68 


Reverse 


3112866 


-CAC 


TTA- 


220 


513 


98 


Reverse 


3113664 


GTG 


TAA 


165 


392 


76 


Forward 
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Assembly 


Start 


Stop 


Start 


Stop 


Length 


Direction 


ID 














31 13 1 ID 


-LAL 


1 IA~ 


C\ A 

94 


291 


66 


Reverse 


31 /41 /o 


CjICj 


rp A a 

I A A 


1 o c\ 

139 


543 


135 


Forward 


31 /41oo 


CjICj 


I AG 


83 


oo o 

283 


67 


Forward 


31 /43 /4 


CjICj 


1GA 


1 C A 

154 


294 


47 


Forward 


o 1 '7/1 mo 
31 /49/2 


~CAC 


1 i A~ 


169 


678 


170 


Reverse 


3175138 


~CAC 


TV"* A 

TCA~ 


79 


945 


289 


Reverse 


3175860 


GIG 


TAA 


r- I 

51 


251 


67 


Forward 


3175918 


GTG 


TGA 


212 


535 


108 


Forward 


38 l 1220 


~CAC 


CTA~ 


316 


873 


186 


Reverse 


381 1436 


~CAC 


TTA~ 


1164 


1511 


116 


Reverse 


3811984 


GTG 


TGA 


134 


454 


107 


Forward 


O O CT^^O 

3857228 


~CAC 


TCA~ 


1141 


1356 


72 


Reverse 


o o o a o 

3857842 


GTG 


TAA 


45 


341 


99 


Forward 


3857996 


GTG 


TAA 


58 


456 


133 


Forward 


O O ^ OO O 

3858236 


~CAC 


CTA~ 


1 


261 


87 


Reverse 


3858264 


~CAC 


TCA~ 


439 


1365 


309 


Reverse 


O O C O £L 1 /"\ 

3858610 


~CAC 


TTA~ 


374 


949 


192 


Reverse 


3858716 


~CAC 


CTA~ 


238 


402 


55 


Reverse 


3859124 


~CAC 


CTA~ 


73 


453 


127 


Reverse 


3859244 


~CAC 


TTA~ 


310 


462 


51 


Reverse 


3859250 


~CAC 


CTA~ 


244 


402 


53 


Reverse 


3859588 


~CAC 


TTA~ 


102 


443 


114 


Reverse 


3859774 


~CAC 


CTA~ 


9 


131 


41 


Reverse 


3860140 


GTG 


TAA 


302 


511 


70 


Forward 


O O^A1 /I /"\ 

3860140 


GTG 


TAA 


605 


856 


84 


Forward 


3860206 


~CAC 


TTA~ 


898 


1056 


53 


Reverse 


3860270 


CjTCj 


TAG 


O A f 

346 


966 


207 


Forward 


3860438 


CjICj 


I AG 


i 


276 


92 


Forward 


3860438 


GTG 


TGA 


460 


1128 


223 


Forward 


3860544 


GTG 


TAA 


222 


689 


156 


Forward 


3860558 


-CAC 


TTA~ 


717 


1376 


220 


Reverse 


3860568 


GTG 


TAA 


1040 


1291 


84 


Forward 


3860582 


GTG 


TGA 


356 


1027 


224 


Forward 


3860724 


GTG 


TGA 


139 


498 


120 


Forward 
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Assembly 


Start 


Stop 


LMJ 






3oOU/Z4 


ptp 
GIG 


TP A 

1GA 


joOUojo 




tap 

1 AG 




GIG 


T A p 

1 AG 


3oouy5z 


P A p 

-LAC 


'l»p a 

1 1 A~ 


3ooUyoZ 


~CAC 


HA- 


3oOlZOo 




IT A~ 


ooz: i oori 
5oOlZ /U 


-LAC 


1 1 A~ 


oo/: 1000 

3861Z88 


P A /~1 

~CAC 


CTA~ 


o qz 1 OA/: 

3861 306 


GTG 


TAA 


38ol3Uo 


GTG 


TH A A 

TAA 


ooz: i oo a 

3861334 


GTG 


TAA 


3864148 


GTG 


TAG 


38o4148 


GTG 


TAA 


3864148 


GTG 


TAA 


3864172 


GTG 


TAG 


3864180 


~CAC 


TTA~ 


3864184 


GTG 


TGA 


"2 Q C A 1 O /I 

3864184 


GTG 


TAA 


3864194 


~CAC 


CTA~ 


3864338 


GTG 


TGA 


3864360 


GTG 


TAA 


3864388 


GTG 


TGA 


38644U6 


~CAC 


TTA~ 


3864452 


~CAC 


TCA~ 


1 Qi^A A C O 

3864458 


PT'p 

GIG 


TAA 


3864458 


GTG 


TGA 


38044 /4 


r** a p 


prp A 

CTA~ 


38644 /4 


p A p 


TTA~ 


3o043 1U 


Z" 1 A /"* 

~LAL 


TT A 

1 1 A~ 


3ot)45ZD 


pAp 


TT A 

1 1 A~ 


3o0434o 


GIG 


nnp a 
1GA 


3o0434o 


PTP 

GIG 


TAA 

1AA 


OQ^/j COO 


PAP 


r | T a 

1 1A~ 


3864604 


~CAC 


CTA~ 


3864604 


~CAC 


CTA~ 


3864610 


GTG 


TAA 


3864716 


GTG 


TAA 


3864718 


GTG 


TGA 


3864802 


~CAC 


TTA~ 
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686 


1 no a 
1024 


1 1 O 

1 13 


Forward 


/in 

610 


807 


66 


T7 l 

rorward 


397 


AOC 

486 


30 


rorward 


A AC\ 

449 


H 1 C 

715 


orv 

89 


Reverse 


152 


646 


165 


Reverse 


vt C~l 

457 


645 


63 


Reverse 


627 


824 


66 


Reverse 


357 


572 


72 


Reverse 


717 


1208 


164 


Forward 


1201 


1410 


70 


Forward 


76 


975 


300 


Forward 


212 


940 


243 


Forward 


1202 


1753 


184 


Forward 


2750 


3037 


96 


Forward 


311 


862 


184 


Forward 


930 


1616 


229 


Reverse 


197 


670 


158 


Forward 


612 


1304 


231 


Forward 


1084 


1380 


99 


Reverse 


552 


1100 


183 


Forward 


47 


1078 


344 


Forward 


1239 


1586 


116 


Forward 


263 


958 


232 


Reverse 


1079 


1201 


41 


Reverse 


797 


1 105 


103 


Forward 


1 179 


1 OA 1 

1391 


71 


Forward 


68 


O A *7 

247 


60 


Reverse 


£LA A 
644 


1 CO o 

1528 


one 

295 


Reverse 


11/1/1 

1 164 


1 H Af\ 

1640 


159 


Reverse 


O A C 

845 


looO 


ooo 

272 


Reverse 


/TOT 

68 / 


1 AC C 


123 


Forward 


979 


1932 


318 


Forward 


317 


550 


78 


Reverse 


1 


141 


47 


Reverse 


1513 


1803 


97 


Reverse 


427 


1305 


293 


Forward 


57 


272 


72 


Forward 


77 


1474 


466 


Forward 


92 


550 


153 


Reverse 
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Assembly 


Mart 


Stop 


Start 


Stop 


Length 


Direction 


in 
















~LAL 


PT a 


324 


C /I o 

548 


75 


Reverse 


joD4oOZ 




PT A 


A 1 1 

431 


1003 


1 A 1 

191 


Reverse 


JOOHOOO 




TT A 

1 1A~ 


1 A 
10 


657 


216 


Reverse 


J004070 


PTr 1 
ulu 


T 1 A A 
1 A A 


1 on 

130 


1 AO A 

1029 


300 


T? l 

rorward 




ulu 


ICjA 


883 


t o o ^ 

1326 


148 


Forward 




OlUr 


TAA 
1 A A 


1030 


1251 


74 


Forward 




~LAL 


1CA~ 


1427 


171 1 


95 


Reverse 


3oCOOZZ 


~LAL 


1CA~ 


279 


1271 


331 


Reverse 


3ot003o 


CjICj 


1 AO 


79 


492 


138 


Forward 




~CAC 


TCA~ 


302 


793 


164 


Reverse 


3865102 


~CAC 


CTA~ 


27 


731 


235 


Reverse 




~CAC 


TTA~ 


416 


808 


131 


Reverse 


joojIdU 


GTG 


TAA 


136 


375 


80 


Forward 


3odj1 /2 


~CAL 


TTA~ 


731 


1 123 


131 


Reverse 


3865228 


GTG 


TAA 


197 


286 


30 


Forward 


3865230 


GTG 


TGA 


272 


586 


105 


Forward 


3865378 


~CAC 


TTA- 


421 


807 


129 


Reverse 


3865470 


GTG 


TAG 


98 


742 


215 


Forward 


3865632 


GTG 


TAA 


46 


456 


137 


Forward 


3865710 


~CAC 


TCA~ 


287 


448 


54 


Reverse 



EXAMPLES 

The examples below are carried out using standard techniques, which are well known 
and routine to those of skill in the art, except where otherwise described in detail. The 
examples are illustrative, but do not limit the invention. 
Example 1 

Isolation of DNA coding for a virulence gene in Streptococcus pneumoniae 

As mentioned above each of the DNAs disclosed herein by virtue of the fact that it 
includes an intact open reading frame is useful to a greater or lesser extent as a screen for 
identifying antimicrobial compounds. A useful approach for selecting the preferred DNA 
sequences for screen development is evaluation by insertion-duplication mutagenesis. This 
system disclosed by Morrison et aL, J. Bacteriol . 159:870 (1984), is applied as follows. 

Briefly, random fragments of Streptococcus pneumoniae, strain 0100993 DNA are 
generated enzymatically (by restriction endonuclease digestion) or physically (by 
sonication based shearing) followed by gel fractionation and end repair employing T4 DNA 
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polymerase. It is preferred that the DNA fragments so produced are in the range of 200- 
400 base pairs, a size sufficient to ensure homologous recombination and to insure a 
representative library in E.coli. The fragments are then inserted into appropriately tagged 
plasmids as described in Hensel et ah, Science 269: 400-403(1995). Although a number of 
plasmids can be used for this purpose, a particularly useful plasmid is pJDC9 described by 
Pearce et al., Mol. Microbiol . 9:1037 (1993) which carries the erm gene facilitating 
erythromycin selection in either E. coli or S. pneumoniae previously modified by 
incorporation of DNA sequence tags into one of the polylinker cloning sites. The tagged 
plasmids are introduced into the appropriate S. pneumoniae strain selected, inter alia, on the 
basis of serotype and virulence in a murine model of pneumococcal pneumonia. 

It is appreciated that a seventeen amino acid competence factor exists (Havastein et 
al., Proc. Natl. Acad . ScL USA 92:1 1 140-44 (1995)) and may be usefully employed in this 
protocol to increase the transformation frequencies. A proportion of transformants are 
analysed to verify homologous integration and as a check on stability. Unwanted levels of 
reversion are minimized because the duplicated regions will be short (200-400 bp), 
however if significant reversion rates are encountered they may be modulated by 
maintaining antibiotic selection during the growth of the transformants in culture and/or 
during growth in the animal. 

The S. pneumoniae transformants are pooled for inoculation into mice, eg., Swiss 
and/or C57B1/6. Preliminary experiments are conducted to establish the optimum 
complexity of the pools and level of inoculum. A particularly useful model has been 
described by Veber et al. ( J. Antimicrobiol. Chemother .32:432 (1993) in which 10 5 cfu 
inocula sizes are introduced by mouth to the trachea. Strain differences are observed with 
respect to onset of disease e.g.,3-4 days for Swiss mice and 8-10 days for C57B1/6. 
Infection yields in the lungs approach 10 8 cfu/lung. IP administration is also possible when 
genes mediating blood stream infection are evaluated. Following optimization of 
parameters of the infection model, the mutant bank normally comprising several thousand 
strains is subjected to the virulence test. Mutants with attenuated virulence are identified 
by hybridization analysis using the labelled tags from the "input" and "recovered" pools as 
probes as described in Hensel et al., Science 269: 400-403(1995). 5". pneumoniae DNA is 
colony blotted or dot blotted, DNA flanking the integrated plasmid is cloned by plasmid 
rescue in E. coli (Morrison et al., J. Bacteriol . 159:870 (1984)) and sequenced. Following 
sequencing, the DNA is compared to the nucleotide sequences given herein and the 
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appropriate ORF is identified and function confirmed for example by knock-out studies. 
Expression vectors providing the selected protein are prepared and the protein is configured 
in an appropriate screen for the identification of anti-microbial agents. Alternatively, 
genomic DNA libraries are probed with restriction fragments flanking the integrated 
plasmid to isolate full-length cloned virulence genes whose function can be confirmed by 
H knock-out M studies or other methods, which are then expressed and incorporated into a 
screen as described above. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: SmithKline Beecham Corporation and SmithKline 
Beecham p . 1 . c . 

(ii) TITLE OF THE INVENTION: Novel Coding Sequences 

(iii) NUMBER OF SEQUENCES: 185 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SmithKline Beecham Corporation 

(B) STREET: 709 Swedeland Road 

(C) CITY: King of Prussia 

(D) STATE: PA 

(E) COUNTRY: USA 

(F) ZIP: 19046 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US97 / 1922 6 

(B) FILING DATE: 27-OCT-19 9 8 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/029,930 

(B) FILING DATE: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Gimmi , Edward R 

(B) REGISTRATION NUMBER: 38,891 

(C) REFERENCE / DOCKET NUMBER: P5 0577 

(ix) TELECOMMUNICATION INFORMATION: 
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(A) TELEPHONE: 610-270-4478 

(B) TELEFAX: 610-270-5090 

(C) TELEX: 

( 2 ) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 495 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



CTCGGTGATA 


GAAATAGTGT 


AATCATGCTT 


TTCTCTTCTT 


ATCTATACTT 


TGCTACTTCT 


60 


ATTATACAAA 


AAAATAAAGC 


GCTTGACTAG 


GGATTTTTAG 


AAAAAAAGCC 


TATTTTTTCA 


120 


AGAAAAATAG 


GCTTTTTGCG 


AACGATTGAC 


ACAATTGGAT 


TTGGTTAATT 


CACTCTTAAC 


180 


GATGGTTTTA 


AACGATATAT 


ATTTTTATAT 


ATGTAAATTA 


AAAACTTCTT 


TCCTTTCACT 


240 


TCCTACGACT 


TTTCAGATAC 


AGATAGCCAA 


AGAAGTTTTC 


ATAGAGGGCA 


AAAAAGAGGA 


300 


GGAAGGCATG 


AAGAAAGAAG 


GTCTCTGGCA 


AAATCATAAT 


AACAGGATCC 


TTGGCTGGAT 


360 


CAAAAAGCCA 


GGTATCATCT 


CCCACAAAGA 


GAATTTGATG 


GAAAAGAGTA 


AAGAATTGGT 


420 


CAAAACCAAT 


CAAAACTCCC 


CCAAGTCCAT 


CATCACAGGT 


AAG AC T AC T A 


GAGCCAGGAG 


480 


ACTTTTTCGA 


TAAAG 










495 



(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 52 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



CTAGAGCAAG 


TATTTTTCAA 


ACTTTTTCCG 


AATAAATAGA 


TAGAGCCAGA 


GAATTTAGTA 


60 


AACCTAGATT 


TAAAAATGTG 


C T AT AAC AT A 


ATATATTGAA 


TCTATAATAG 


TACACCTTGA 


120 


CTGCTAAAAT 


ATTTCTATAA 


ATTAATTTGA 


CTTTCCTGAT 


AGAGTTATTC 


ACATCTTATT 


180 


TCAACTCACT 


ATAGAAGGAG 


GAATAGGAGG 


ATTCTCAGAC 


ATCCGGGCAT 


CAGCCCAACT 


240 


AATGATTTGA 


TTGCTAAGAA 


AATATTCAGC 


AATCCAGAAA 


TCACTTGTCA 


ATTTATTCGC 


300 


GATATGCTGG 


ACTTGCCAGC 


AAAAAATGTT 


GACCATTTTG 


GAGGGAAGCG 


ATATTCACGT 


360 


ATTACTCTCC 


ATGCCTTACT 


CAGTGCAGGA 


TTTTTATACC 


AGTATAGACG 


TCTTGGCGGA 


420 
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GTTGGATAAC GGTACTCAAG TAATTATTGA GATTCAAGTC CATCATCAGA ATTTTTCATC 48 0 

AATCACTTGT GGACTTACCT GTGCAGTCAG GTTAATCAAA TCTTGAAAA 52 9 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 885 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



CTCATCATCT 


GTCAAAAAGC 


GTTTCTTAGC 


AGTCGTGATA 


TCCATAAAAT 


AATCTAATAT 


60 


CACGATTTCC 


TCATCCGCAA 


AGAAAGGAAG 


GCTGACCAAC 


TCCAGTGCCA 


CATCCTTGTA 


120 


AACTACTTCT 


TGCATATCAA 


AGTAGGCAAA 


GTTGAGGTCA 


GCAGAATCAT 


ACCCAATCTG 


180 


TTTCAACACT 


TGACTCTTCA 


TCACTTCAAA 


CTGACCCTGA 


TCTGTCCCTG 


TAAATAGGCG 


240 


CAGGCTCGGT 


AAATTCGATA 


AAGTCAACTT 


CTGACTTTCT 


TCAATGGCTA 


GCATCGTCTC 


300 


TCCTTTCTTC 


AGATTTTTCG 


ATTTAATTTA 


GTCAATATAG 


CGCAATTTCC 


CACGGAAATC 


360 


TTCTAAGCTC 


TCGTAGCCTT 


TTTCCACCAT 


GATTGCTTTC 


AGTTCATTGG 


TAAAGCGGTC 


420 


AAAAGCACTG 


ACGCCTTCTT 


TGTGAAGGGT 


CGTTCCCACC 


TGCACCATAC 


TTGCTCCACA 


480 


GAGGATGTGT 


TCAAAGGCAT 


CTCGACCAGT 


CAGAACGCCA 


CCTGTTCCGA 


TAATTTGGAT 


540 


TTGAGGATTT 


AAACGTTGAT 


AAAAGGCGTG 


AACATTGGCT 


AGAGCAGTCG 


GTTTGATGTA 


600 


TTATCCACCA 


ATTCCACCAA 


AACCATTCTT 


AGGCCGAATA 


ACGACAGATT 


CGTCTTCTAT 


660 


ATAGAGGCCG 


TTTCCGATAG 


AGTTAACGCA 


GTTGACAAAC 


TTGAGCGGAT 


ATTTGTTGAA 


720 


AATAGCTGCC 


GCTTGATCAA 


AGTGAACAAT 


ATCAAAATAA 


GGTGGCAATT 


TAATTCCAAG 


780 


AGGTTTGGTG 


AAGTAAGCAA 


ACACTTCTGC 


CAAAATCCGG 


TCTGTTGTCT 


CAAAATCATA 


840 


GGCAATCTGA 


GGTTTACCTG 


GAACATTTGG 


ACAGGAAAGA 


TTTAG 




885 



( 2 ) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 925 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

TCTTGGCCAA CTGCATGGAG TTCAGCGGTC AATTTCAACG CACCTGAGAA ACAGACCCCT 60 
GCACCCCTGA AATCTCAGGA GACATGATGG TCTGGATGGA ATCAATAATG AGAAAGTCTG 12 0 
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GCTGGATACG 


CTACCACTTC 


TGCACGAACA 


CTCTGCATAT 


TGGTCTCTGC 


ATAGAGATAA 


180 


AACTCACTAT 


CAAAATCACC 


TAAGCGCTCT 


GCACGTAGTT 


TAATCTGCTG 


GGCAGACTCC 


240 


TCCCCACTGA 


CATAGAGAAC 


TGTCCCCACT 


TGGGACAACT 


GGGTTGAGAC 


TTGTAGGAGA 


300 


AGAGTTGATT 


TCCCAATCCC 


AGGATCCCCA 


CCGATGAGGA 


C GAG AC TTTC 


CTGGTACAAC 


360 


TCCGCCTCCA 


AGCACACGGT 


TGAATTCCTC 


CATCTCCGTC 


TTGGTTCGAT 


TGACATTGAT 


420 


GGAAGTCACC 


TCAGCTAGTT 


TCATGGGCTT 


GGTTTTCTCA 


CCTGTCAAGG 


ACACACGCGC 


480 


ATTCTTGACC 


TCGGCAACCT 


CAACCTCTTC 


CACAAAAGAA 


GACCAAGACC 


CACAGTTGGG 


540 


GCAACGTCCC 


AGATATTTAG 


GGGAATTATA 


CCCACAATTT 


TGACATACAA 


ATGTCGCTTT 


600 


TTTCTTTGCG 


ATGACAAACC 


TCTTTCTATA 


TCTCTAACTC 


ACACTCAATC 


AC TTGGC AAA 


660 


AATCAATCTT 


CTCATTTGGC 


ACAAACTGGC 


GCATGAGCAT 


TCGATGAGCA 


ACAACTACCA 


720 


CAGTCTGATG 


TTCTCGATAC 


TTAGACATAC 


ATTCTAGAAA 


CCGAGACTTC 


ATTTCCGTAG 


780 


CTGTCTCATA 


TTGAATAGGA 


CTATTAGGAA 


GCAACTCCCC 


CTTGTTTTCT 


AAAAACAGTC 


840 


TTCTAGCTGT 


TTCAAAGTTT 


TCTATTCCTG 


TTTTATAGAC 


CTGCCATTCA 


TGTAATAAAG 


900 


GCTCTACTCT 


TAAAGGAAGA 


CCCGT 








925 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 602 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



TTATGTCAGT 


GGGATTACGC 


CTAATCTCCC 


AGAAGCAGAA 


TTATTATCCG 


GTCAGGAAAT 


60 


TAAAACCTTG 


GNAGACATGA 


AAACTGCAGC 


GCAGAAATTG 


CATGATTTAG 


GAGCGCCAGC 


120 


AGTCATTATC 


AAAGGGAGGC 


AATCGTCTTA 


GTCAGGACAA 


GGCTGTGGAT 


GTCTTTTATG 


180 


ATGGACAGAC 


CTTTACTATC 


CTAGAAAATC 


CAGTTATCCA 


AGGCCAAAAT 


GCTGGTGCAG 


240 


GTTGTACCTT 


TGCCTCTAGC 


ATTGCCAGTC 


ACTTGGTTAA 


AGGTGATAAA 


CTTTTGCCAG 


300 


CAGTAGAAAG 


CTCTAAGGCT 


TTCGTTTATC 


GTGCTATTGC 


ACAAGCAGAT 


CAGTATGGAG 


360 


TAAGACAATA 


TGAAGCAAAC 


AAAAACAACT 


AAAATCGCCC 


TTGTATCCCT 


ATTAACCGCC 


420 


CTTTCTGTGG 


TTCTAGGTTA 


TTTCTTAAAA 


ATCCCAACAC 


CTACAGGNAT 


TCTAACTCTT 


480 


TTAGATGCTG 


GTGTCTTCTT 


TGCGGCCTTT 


TACTTTGGTA 


GTCGTGAAGG 


AGCGGTAGTC 


540 


GG AGG AC TAG 


CAAGTTTCTT 


GCTTGACCTC 


TTATCAGGCT 


ACCCTCAGTG 


GATGTTTTTT 


600 


AG 












602 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 456 base pairs 

(B) TYPE: nucleic acid 
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( C ) STRANDEDNES S : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



CTGGATACTA 


AGAGAAATCA 


AAAAAGCACT 


C T AGG AT AG A 


GGCCTAAAGT 


GCTTAGTTTC 


60 


AAGGCTTTAC 


AGCCTATCAT 


ATTTAATAAA 


ATATTACAAC 


ATCTTGTTGT 


AGAATTCAAC 


120 


GACAAGTGCT 


TCGTTGATTT 


CTGGGTTGAT 


TTCGTCGCGT 


TCTGGCAAGC 


GAGTCAATGA 


180 


ACCTTCCAAT 


TTTTCAGCGT 


CGAATGATAC 


GAATGCTGGA 


CGTCCAAGAG 


TAGCTTCTAC 


240 


TGCTTCAAGG 


ATTGCTGGAA 


CTTTCAATGA 


TTTTTCACGA 


ACTGAGATCA 


CTTGACCTGC 


300 


AGTTACGCGG 


TATGATGGGA 


TATCAACGCG 


TTTCCCGTCA 


ACAAGGATGT 


GACCGCTGGT 


360 


TTACAAATTG 


G AC C AAACTT 


GACGACCAGT 


AGTCGCGAGA 


CCAAGACGGT 


AAACAACGTT 


420 


ATCCAAACGA 


CGTTCCAAAA 


GAAGCATAAA 


GTTGAA 






456 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1961 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



C T AAT AT AG A 


ATAATCACCG 


CCGTTGTGAA 


AGAAC GATTG 


GATGATAATC 


CAATCGTTCA 


60 


GGGAAATTGG 


AAGACCTTGG 


GTTTCCAATT 


TAGGCATGAG 


ACACCTTTGG 


TGGCTGCTGC 


120 


CGTCCCTCAC 


AAGCTAAGGT 


GATTGTTGAA 


AAAGAGGAAA 


AAGGAGAAGA 


AATGAAACCA 


180 


GTAATTTCCA 


TCATCATGGG 


CTCAAAATCC 


GACTGGGCAA 


CCATGCAAAA 


AACAGCAGAA 


240 


GTCCTAGACC 


GCTTCGGTGT 


AGCCTACGAA 


AAGAAAGTTG 


TTTCCGCACA 


CCGTACACCA 


300 


GACCTCATGT 


TCAAACATGC 


AGAAGAAGCC 


CGTAGTCGTG 


GCATCAAGAT 


CATCATCGCA 


360 


GGTGCTGGTG 


GCGCAGCGCA 


TTTGCCAGGC 


ATGGTAGCTG 


CCAAAACAAC 


CCTTCCAGTC 


420 


ATTGGTGTGC 


CAGTCAAGTC 


TCGTGCTCTT 


AGTGGAGTGG 


ATTCACTCTA 


TTCTATCGTT 


480 


CAGATGCCGG 


GTGGGGTGCC 


TGTTGCGACC 


ATGGCTATCG 


GTGAACTCTT 


TTTTAGGATA 


540 


TAAAACAGGG 


TTCGGATAAG 


TTTTTTTGCA 


AGGTGGATGA 


TGGCTACATT 


GTAATGTTTT 


600 


CCTTGTTCTA 


ACTTAGTCTT 


AAAAGCAGGT 


GAAAAGTGAG 


GGCATGCTTT 


GGCAGCTTGT 


660 


ATGAGTACCT 


ACCGCAGATA 


AGGGGAACCC 


CGTTTGACCA 


TCCTCCCAGC 


TAAATCAATC 


720 


TGACCTGACT 


GATAAATAGA 


AGAATCCAGT 


CCAGCGAAAG 


CTTGTAATTG 


AGCAGGATTA 


780 


TCAAAGGCAT 


GAATATTTCG 


AATCTCGGCT 


AAAATGACCG 


CCCCTAAACG 


ATTCTCAATC 


840 


CCAGTAACCG 


TCGTGATGAC 


CGAGTTTAAC 


TCAGCCATCA 


AGTCATTGAC 


ACATTTTTCC 


900 


GCCTTGTCAA 


TGAGCCTCTT 


GTAATGTTTG 


ATGTTTTCAT 


TACACGAGAT 


AAAACGTCTA 


960 


TGCGTTATCA 


AACTCATTAC 


CAATTAAAAC 


AAATGTGGTT 


AGATCCTTTC 


GGAAATTGTC 


1020 
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AAGCGATTGG 


AGGAAATGAA 


CTAATCCACA 


GCGGCTTATT 


CCAAGTATAC 


CACTTGGGCT 


1080 


TTGGCAGTAG 


CTAACTGCGC 


TAAATATAAT 


ATAAGGAGGA 


GTAAAATGAA 


GACAGTTCAA 


1140 


TTTTTTTGGC 


ATTATTTTAA 


GGTCTACAAG 


TTCTCATTTG 


TAGTTGTCAT 


CCTGATGATT 


1200 


GTTCTGGCGA 


CTTTTGCCCA 


AGCCCTCTTT 


CCAGTCTTTT 


CTGGACAAGC 


GGTGACGCAG 


1260 


C T AGC C AATT 


TAGTTCAAGC 


TTATCAAAAT 


GGGCAATCCA 


GAAC TTGTAT 


GGCAAAGCCT 


1320 


ATCAGGAATT 


CATGGTCAAT 


CTTGGCCTGC 


TGGTTTTGGG 


TTCTATTTAT 


CTCTAGGTGT 


1380 


AATATAAACA 


TGTGTCTCAT 


GACGCGCGTG 


ATTGCAGAAT 


CGACCAACGA 


GATGCGCAAA 


1440 


GGTCTCTTTG 


GTAAGCTTGC 


TCAGTTGACG 


GTTTCTTTCT 


TTGACCGTCG 


ACAAGATGGC 


1500 


GATATCCTGT 


CTCATTTTAC 


CAGTGATTTG 


GATAATATCC 


TCCAAGCCTT 


TAACGAAAGC 


1560 


TTGATTCAGG 


TCATGAGCAA 


TATTGTTTTA 


TACATTGGTC 


TGATTCTTGT 


CATGTTTTCG 


1620 


AGAAATGTGA 


CGCTGGCTCT 


CATCACCATT 


GCCAGCACCC 


CATTGGCTTT 


CCTTATGCTG 


1680 


ATTTTCATCG 


TGAAAATGGC 


ACGTAAATAC 


ACCAACCTCC 


AGCAGAAAGA 


GGTAGGGAAG 


1740 


CTCAACGCCT 


ATATGGATGA 


GAGCATCTCA 


GGCCAAAAAG 


CCGTGATTGT 


GCTAGGAATT 


1800 


CAAGAGGATA 


TGATGGCAGG 


ATTTCTTGAA 


CAAAATGAGC 


GCGTGCGCAA 


GGCAACCTTT 


1860 


AAAGGAAGAA 


TGTTCTCAGG 


AATTCTTTTC 


CCTGTCATGA 


ATGGGATGAG 


CCTGATTAAT 


1920 


ACAGCCATCG 


TCATCTTTGC 


TGGTTCGGCT 


GTACTTTTGA 


A 




1961 



{ 2 ) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 375 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 



CTATCTCCAA 


GTNCGNTTGG 


AATNCCTCCG 


CNANCCACAA 


CTCATCCAAG 


CACTTTNCAA 


60 


CGTGNCCTGG 


TCCGGTCCTC 


CAGTGCGTCT 


NACNGCACCT 


TCAACCTGCN 


CATGGGTAGG 


120 


TCACATGGCT 


TCGGGTCTAC 


GTCATGATAC 


TAAGGCGCCC 


TATTCAGACT 


CGGNTNCCCT 


180 


AGGGCTCCGT 


CTCTTCAACT 


TAACCACGCA 


ACAGAACGTN 


ACCCGCCGGT 


TCATTCTACA 


240 


AAAGGCAGNC 


TCTCACCCAT 


TAACGGGCTC 


GAACTTGTTG 


TAGGCACACN 


GCTTCAGGTN 


300 


CTATTTCACC 


CCCCTCCCGG 


GGAGCANCTC 


AACTGACCCN 


CACGGCACCG 


GTGNANNAAA 


360 


CGGTCACTTA 


GGGAG 










375 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 665 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



GGGGGGGGTN 


NNTTCTGGGG 


CCGGGTGNNT 


CCTNGAAAAA 


ATGCTGGACT 


TAACGGTTAA 


60 


ATCATTTGAA 


TTGGCCTGTG 


GATTTTAGCT 


AGCAATCCAG 


AGCGAGTTTT 


CTCCAAGACA 


120 


GACCTCTATG 


AAAAGATCTG 


GAAAGAANAC 


TACGTGGATG 


ACACCAATAC 


CTTGAATGTG 


180 


CATATCCATG 


CTCTTCGACA 


GGAGCTGGCA 


AAAT AT AG T A 


GTGACCAAAC 


GCCCACTATT 


240 


AAGACAGTTT 


GGGGGTTGGG 


ATATAAGATA 


GAGAAAC CGA 


GAGGACAAAC 


ATGAAACTAA 


300 


AAAGTTATAT 


TTTGGTTGGA 


TATATTATTT 


CAACCCTCTT 


AACCATTTTG 


GTTGTTTTTT 


360 


GGGCTGTTCA 


AAAAATGCTG 


ATTGCGAAAG 


GCGAGATTTA 


CTTTTTGCTT 


GGGATGACCA 


420 


TCGTTGCCAG 


CCTTGTCGGT 


GCTGGGATTA 


GTCTCTTTCT 


CCTATTGCCA 


GTCTTTACGT 


480 


CGTTGGGCAA 


ACTCAAGGAG 


CATGCCAAGC 


GGGTAGCGGC 


CAAGGATTTC 


CCTCCAATTT 


540 


GGANGTTCAA 


GGTCCCTGTT 


AAATTTCCCC 


CATTTAGGGG 


CAACCTTTTA 


ATGAAANTTT 


600 


CCNTNATTTG 


CCGGGTANCT 


TTGAATCCCT 


NGGAAAAAAC 


CCAACNAAAA 


AAAGGGCTTA 


660 


NNCCC 












665 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 89 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



CTACGATATC 


TTTGGTCTTT 


TGTAAGATAT 


GAGGTCCACC 


CTTATGCGCC 


TCAGTTGGCA 


60 


TTTCATGCGA 


TTCAAGAAGT 


TGCCCCTCTT 


GATCAACCAA 


ACCATACTTG 


ATGTTGGTTC 


120 


CACCGATATC 


AATTGCAACG 


TAATATGTCA 


TAAATACCTC 


CTTTTAGATT 


AGAGGAAGCG 


180 


CTCCTTGGTT 


TCACGAATCA 


AGGCAGCAGC 


CGCTTCTACA 


ACTGGACGAT 


CTTCTTCAGT 


240 


CACTGGTGTC 


AATGGTGAAC 


GAACAGATCC 


AATATTCAAG 


CCTTCATTGA 


TTTTCAAGAC 


300 


TTCTTTGATG 


ACACCGTACA 


TATTTCCATG 


AGCAGAAGTG 


AGTTTACCAA 


TGATTGCGTT 


360 


GATAGCATAC 


TGCAATTCAC 


GCGCTGTTTC 


TAGGTCCTTA 


TCCGCAATCA 


ACTGATTGAG 


420 


TTTCAAGAAG 


AGTTCTGGCA 


TAGCACCATA 


AGTACCACCG 


ATACCAGCCC 


TAGCCCCCAT 


480 


GAGGCGTCCT 


CCTAGGAACT 


GCTCATCAGG 


AC C ATT AAAG 


ACGATATGGT 


CTTCTCCACC 


540 


AAGGCTGACA 


AAGGTTTGGA 


TATCTTGAAC 


TGGCATAGAA 


GAGTTCTTCA 


CACCGATAAC 


600 


ACGAGGATTT 


TTCAACATTT 


CTGTGTAAAG 


GCTTGGAGTC 


AAAGCAACCC 


CTGCCAATTG 


660 


AGGAATGTTG 


TAAATCACGT 


AGTCTGTGTT 


TGGAGCTGCA 


GAAC TGATAT 


CGTTCCAGTA 


720 


TTTGGCAACT 


GAGTTATTCT 


GGCAAGCGGA 


AATAAATTGG 


TGGAATCCGT 


TGCAATAGCA 


780 


TCTACTCCCA 


AGCTTTCAGC 


ATGGCGAGCA 


AGTTCCATAC 


TATC TTTAGT 


ATTATTGCAA 


840 
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GCAACATGGG CAATAATGGT CAATTTACCT TTGGCTACCG CCATGACTTC TTCCAAAATC 900 
AACTTGCGAT CTTCAACGCT TTGGTAGATA CATTCACCAG AAGAACCATT GACATAAGAC 9 60 

CTTGAACACC TTTATCAATG AAGTATTGA 9 89 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1450 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



CTCCATATTT 


CTTAGCCTTC 


TCAATTAGGG 


TCTTGAAGTC 


TTCGACACCA 


CCGATACGCT 


60 


TACCAATATC 


AGCATAGTTC 


AAGTGACCAG 


AGTCATGGCT 


GTGATATCCT 


TAACTTTTTC 


120 


CCAACCTTGA 


GGGTTGTTCA 


TAATGCTACG 


ATAAGCAATG 


GCACCATCTT 


GCCAATCAAC 


180 


TTTCTTGTCT 


GCATTGGCAT 


CTTCAGTGAT 


AACAACCTTA 


GCACTTGGAA 


GTTCCTTCGT 


240 


GTATTCTGGG 


AAAACAATGC 


CCTTATAAGC 


TTTTTCCCAT 


TGCCATTCAG 


AGCTGTGGAT 


300 


TCCTACATAG 


TTGGCATTTC 


CGACTGTTTC 


TTTATAAGCT 


GTCAAACGAG 


TCCAGTCATT 


360 


CGAACCACCA 


CCATAGCTAT 


TTTGAGAGTT 


ACTCCAAACA 


CCAGCAGCAA 


GCTTATCTGT 


420 


AGAAACAAAT 


CCATACATGT 


AACCCTTAGC 


CAAATCCTTC 


ATTGGATTGG 


TTACATCGAT 


480 


ATGATCATCT 


CCGCTGACAT 


GCGTATTGTT 


TGACATGGTT 


GCCCCATCAA 


ACTTAGCACC 


540 


AGTTTGATCA 


C T AG AAAC AG 


AGACTAAAGC 


ATTGCCGAGG 


AAAC T AAT AG 


AAGAAAGTAG 


600 


TTTTCTTTCG 


TCATCAATCT 


TTTGACCTGG 


AGTGACTTGA 


TTGTGGTTGA 


CAATCTTGGT 


660 


CACATCAAAG 


TGCAATTGAT 


TGTCCACAAC 


TTGCAAGCGT 


ACTGTCATTT 


CCGCATTGAT 


720 


TAAGTGAGCA 


TCATCGCGAA 


GCTTCATCAA 


GTACTCTGCT 


GTTGTCTCAT 


TGATTTTTTT 


780 


AT AAGTG AC T 


TCAGGGGTGA 


TTCGGTGGTT 


ATTGATAAAG 


ACTTGGTTGA 


ATTGTTGCAC 


840 


CTGTCCTGGC 


AAAGTATGTC 


CATTCAAGGT 


GTATCCCTTG 


ACACGAAGGA 


AGGCTTGGTC 


900 


AATTACTGCC 


TTAAGTACCT 


T AAAC TGG AT 


CGTATCATAA 


GTCACCTTGC 


TATCGTCAAC 


960 


AACCGGACCT 


GTTTCTTTCT 


GGGCAGGGGT 


ATCCTCTGGG 


TTTTACCCTC 


TCTGTGGCTA 


1020 


TCCGTTTCAA 


CGCTTGAACA 


ACTGGTCGCT 


CATCGTCATA 


AGAGCCCGCC 


TTGAGAAAAA 


1080 


TCTTCTTCTC 


ATTTCTAAGA 


TGGTCATTGA 


CCGCAGCTGG 


TAGAGTCACT 


GTGTCAAAGA 


1140 


AGATTGACAT 


CCTTATTTGC 


CTGGCATTTA 


CCTGACCGTC 


TGACTTGAAG 


ACTGATAGAG 


1200 


AGACGGTTTG 


TTGATCCTGT 


TTCAGGAGCA 


GCAACACGAC 


TACCTCTATA 


CCAAGTGCTA 


1260 


GTTGTTGGAG 


ATTTATACTC 


CCAGAACCAG 


CCATCCTTGT 


CATAACCGAC 


AAAAACATTA 


1320 


TTATTGGTAT 


CTTTAAATTT 


CAAGGAGACA 


CCAAAGCGTG 


ATTTGCCCTT 


TTCAGAATCT 


1380 


TCTTTGAAGG 


TTAAATCAAC 


AGTTGCATTT 


CCATTGGCAT 


CAACGGTCAA 


GCCCTTCTTT 


1440 



TCAAACAGAG 1450 
(2) INFORMATION FOR SEQ ID NO: 12: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 



CTGCGAGTTG 


TGAGGCTCCT 


ATTATGTCTC 


GTGATTAAAA 


TCTCTATAAG 


GTGATTTTGG 


60 


AGGGAAATTA 


TCGGGCGACA 


GCGGGTAGAG 


AAGAGATGAA 


AGAGGCTATT 


TTGGAATATC 


120 


AAGCAAATCC 


TGCTGCCTTA 


AAAGATCTCA 


AAGAAAAGGC 


TAAGAATATT 


TCCAGAGAGT 


180 


ATTCTGAAGA 


GCATCTGTTA 


CAAATCTGGT 


TGGACTTTTA 


TGAGAAACAA 


GCCGCTTTAG 


240 


GGACAAAGTA 


AAAAGTGAGG 


TAATCTATGC 


GAATTGGTTT 


ATTTACAGAT 


ACCTATTTTC 


300 


CTCAGGTTTC 


TGGTGTTGCG 


ACCAATATCC 


CAACCTTGAA 


AACCCACCTT 


GAAAACACGG 


360 


ACTTGCCTGC 


ATT TNT AT C T 


CATACAATCC 


AC CGAATTTC 


GATGTCCCCC 


TCCCTACAAC 


420 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 661 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 



CTCCCCAAAC 


TTTTATTTGA 


GAGTGAACGG 


TATAAGAATA 


TGAAACCGGA 


GGTTAAGGTG 


60 


GTTTACTCAG 


TTTTAAAAGA 


TCGGTTGGAG 


TTGTCTTTGA 


GCAAAGGTTG 


GATTGATGAG 


120 


GATGGGACTA 


TTTATTTGAT 


TTATTCCAAT 


TCAAATTTGA 


TGGCACTTTT 


AGGCTGTTCA 


180 


AAGTCAAAAT 


TACTCTCCAT 


GTGAGTTTGA 


AGTGACATTT 


TTAGATGATT 


ACCATAAAAA 


240 


ACATAACTAC 


CCACTATTTT 


ACGAATCCTA 


TCTTCAAAAC 


GTTATGGAAT 


TCCTTGAAAG 


300 


TCAAGACATA 


AAGAATGGGG 


TTGATGCCTT 


TGTAGATGAT 


CATCAAAATC 


TCGTTTTTGT 


360 


TTTATATGGA 


CAAGGCTATC 


GAGCCGAGGG 


AAAAGAGGGA 


ATACTTACAA 


CCCAAGTAAC 


420 


TGTAAAAGCT 


TATGATGAAG 


ACAAGAAACC 


GATTAACTTC 


GCAAATTTAT 


TAGATTCCTT 


480 


AATCGTGTCA 


GAATATCAAA 


TGGAACCGAA 


TCTTTGGGAG 


GTCTCCTATG 


ATTGATCTCT 


540 


ATCTAAGTAA 


AAATAGCCGA 


AGAAATCAAC 


TTCTTTTAGA 


CTTCTTCCAA 


AACTATGGCA 


600 


TCGAGGTATC 


TTGTCATTCA 


GTTTCTGAAA 


TGACAAAGGA 


CAAATTAATT 


GAGATGATGA 


660 


G 












661 



(2) INFORMATION FOR SEQ ID NO:14: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



CTGCCCCTGT 


AAGGCTGGAC 


GATTGCCTTT 


CTTAGTATCC 


GCAAAGAGGT 


AAACTGAGAA 


60 


TAGAGAGGAT 


TTCTCCTTCA 


ATATCTTTGA 


CAGACAGGTT 


CATCTTGCCT 


TCTACGTCTG 


120 


AAAAAATCCG 


CATATTGACC 


AGTTTTCTCA 


CAGCATAGTC 


CAAATCTTCC 


TCTTGGTCCT 


180 


CTGGTCCAAC 


ACCAACCAGC 


AATAAAAGTC 


CCTGATTGAT 


TTTTCCCTGA 


ATCTGGCCTT 


240 


CTATACTCAC 


TTGGGCTTTT 


TTAACCCGTT 


GGATAATGAT 


TTTCATAATA 


GCCTTTCTAG 


300 


TAAGAGC T AG 


GACAACTAGC 


CGTTGGTCCG 


TTTGACAGAG 


TAAACTTCTG 


GCACACTCTT 


360 


AATTTTATCG 


ACAACCGTGG 


TCAGTGTAGA 


GAGGTTGGCA 


ATACCGAAGG 


ACACATGGAT 


420 


ATTAGCAAAC 


TTCATATCCT 


TGGTTGGTTG 


GGCATTGACC 


GTTGAAATAT 


TCTTGGTTGT 


480 


ATTTGAAAGA 


ACTTGCAGTA 


CATCGTTCAA 


CAGTCCTGTA 


CGGTTGAGAC 


CGTAGATATC 


540 


GATATGGGCC 


ATATACTCCT 


TATTTGAGCT 


AGAGTACTGG 


TCTTCCCATT 


CCACATCAAG 


600 


GAGACGTTGC 


TCGTAGTTTT 


CTTGGGCACG 


CAGGTTCATA 


CAGTCCACAC 


GGTGAATAGC 


660 


CACACCACGA 


CCCTTGGTAA 


TGTAGCCAAC 


AATATCGTCA 


CCAGGCACGG 


GGTTACAACA 


720 


CTTAGCAATC 


CGCACTAGGA 


G AC C AG AAGC 


ACCTTCAATA 


ACCACTCCCC 


CCTCATGCTT 


780 


GACCTTGGAG 


AGTTTCTTTA 


TTTTCAACCT 


TGACCTCGCC 


ACCTTTGACA 


AGCTCCTCTG 


840 


CCTCAGCCTT 


GGCCTTGGCA 


CGCTCTTCCT 


CACGGCGTTC 


TTTTTCAGTC 


AGACGGTTAA 


900 


AGACGGTAAT 


CGCACCGATT 


TCCCCAAAAC 


CAATGGCCGC 


AAAGAGGGAG 


TCTTCTGTCT 


960 


TGTAACTGGT 


CTTTTGCAGA 


ACTTGATCCA 


TGTGGCGCTT 


GTCCATAAAT 


TTATTTGC C A 


1020 


CATAGCCATT 


TTCTTGGAAC 


TGAGCCATCA 


GCATCTCACG 


ACCCTTGTTG 


ACAGACAATT 


1080 


CCTTATCTTG 


GTTTTTAAAG 


AACTGGCGAA 


TCTTATTGCG 


CGCCTTGCTA 


GTCTTGACCA 


1140 


TATTGAGCCA 


GTCACGGCTA 


GGTCCAAAGG 


AGTTCGGGTT 


GGCGATAATT 


TCAACCTGAT 


1200 


CCCCTGTCTT 


TAACTTGGTT 


GTCAGTGGAA 


CCATGCGGCC 


ATTGACCTTG 


GCACCAGTTG 


1260 


CTTTTTCACC 


GACCTTGGTA 


TGGATTTCGT 


AGGCAAAATC 


AATCGGTCCT 


GAATCTTTGG 


1320 


GAAGAGAACG 


GACAGCTCCA 


TCTGGGGTAA 


AAACGTAAAT 


CTCCTCAGCC 


AGATAGTTTT 


1380 


CCTTAACAGA 


GTCCACAAAT 


TCCTTAGCAT 


CATCAGCCTG 


GTCTTGGAG 




1429 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1513 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 



CTCTGCAATG 


ATGTACTCAA 


ACATCTCCGC 


TTCTAGTTCC 


TCCTTAGGCA 


GAGGCAATTT 


60 


CCCACGTCGC 


t\ m m m -n 

ATCCGGTTCA 


TAAAGACCGT 


ATGGTTTTCT 


AAAATCAAAC 


TATACAAACT 


120 


CATGTGGGGA 


ATATCCAATC 


CAATGGCTTT 


AGCCACATTT 


TCCTTTACTT 


GCTCCATGGT 


180 


CTGACCAGGC 


AGAGCATAAA 


TCAAATCAAT 


GGAGATGTTG 


TCAAAACCAG 


CCAGTTTCAG 


240 


GCGATCGATA 


TTTTCATAAA 


TATCCTTCTC 


CAAATGACTG 


CGCCCAATCT 


TTTTCAACAT 


300 


CTTATCATCA 


AAGGTCTGGA 


CACCTAGCGA 


AACACGATTG 


ACAGCCGAAT 


TTTTCAAAAC 


360 


AGCTATCTTA 


TCCGCATCCA 


AATCGCCTGG 


ATTGGCTTCA 


ATGGTCAACT 


CTTCCAAGAC 


420 


AGACAAATCC 


AAGTTTTTAG 


TCAAGCCATT 


CAGTAACACC 


TCCAGTTGCG 


GAGCCGACAG 


480 


GGCTGTCGGT 


GTTCCACCAC 


CGATATAAAG 


GGTTGACAAC 


TTTTCAATAT 


CATAAGAACG 


540 


AAACTCTTCC 


AGCAGATGCT 


CTAAATAGCT 


GTCGACTGGC 


TGATTTTTGA 


TGAAGACCTT 


600 


TGAAAAATCA 


CAATAATAAC 


AAATCTGGGT 


ACAAAATGGG 


ATGTGCACAT 


AGGCTGACGT 


660 


TGGTTTTTTC 


TGCATAGTAA 


TTATTATACC 


ACAAAGACTA 


GATTCCAGAT 


AAAAATC AC C 


720 


ATCCCCAGAT 


ACATAGTCCG 


TCCGGAGATG 


GTGATGGTTT 


ATTCTTCTGT 


TATATCAATC 


780 


ACAATCTCTT 


CTGAGTCATC 


AAGAGCTTCG 


GCTTTTTCTT 


GCCATTGTTC 


CTTGAGATTA 


840 


TTTAATTGAT 


TTTTTGATGC 


TTCTGTCGCT 


TGAAAAGCAT 


AGGATTTAGC 


TTGAGCAAGT 


900 


ATACTGTCCA 


CAGTGATTTC 


ACCTGACTCA 


ACCTGTTCTT 


TTGTTTTCAG 


AACAAAATCT 


960 


GTAGCCTGCT 


CCTTAACTTC 


TGTCAGTTTT 


TCACAGACTT 


GCTCCTTGGC 


ATACTCCGGA 


1020 


TCTTCTCTCA 


AATCATCTAA 


AAAATCTTGA 


GCCTGACTGC 


AAACTTGTTT 


GCCCTTATCA 


1080 


CTTGTTAAAA 


ACAAGGCAAG 


AGCTGCACCT 


GAAACGGTTC 


CTAAAAGGAT 


TGAGGATAAT 


1140 


TTACCCATAA 


GGATTCTCCT 


TTTTTATTTT 


TTGAAAAATT 


TACTTGCAAG 


ACGAAGAGCT 


1200 


GACAGACTTG 


CACCAGTCTT 


GAGTGTTTTT 


GAACCAGCTG 


ATGAAGCTTT 


CTTGCTCAAG 


1260 


ACACGCGCAT 


GGTCATTGAG 


GTCTGAAACA 


GATAGAGATA 


AATCTGCAAC 


AGCACTGAAG 


1320 


AGTGGATCAA 


TCGTAGCCAC 


CTTGACATTG 


ATATCATCTG 


CCAAGACATT 


GACCTTAGCC 


1380 


AACAACTCAT 


TGGTGTGATG 


CAAGGTCACA 


TCCACATCTG 


AAGTCAAGGT 


TTTAATCGTC 


1440 


TTTTCTGTTT 


CATCGATGAC 


ACGACCAAGC 


TTTTGTACAG 


TAATGATCAG 


ATAGACCAAA 


1500 


AAGACAATCA 


CAG 










1513 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTCTTGTCAG AGAAATTTAC AAAACGTTAG GAGAATAAGA TGGCATTTAT TGAAAAAGGT 6 0 

CAAGAAATCG ATATGGAAGT CATCAAGGCT GAAACCCAAT TGTCTGCAGA AGCCTTGAGA 12 0 

CTCAAGGAAA GCCGTGACAG GGAATTGGCA GATATTATTT CAGGGGAAGA TGACCGTATT 180 
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CTCTTGGCTG ATTGGTCCTT GCTCTTCTGA TAATGAAGAG GCGGTCTTGG AATATGCTCG 2 40 

CCGTTTATCC GCCTTGCAAA AGAAGGTAGC GGATAAGATT TTCATGGTCA TGCGCGTGTA 3 00 

TACTGCTAAG CCTCGTACCA ATGGAGACGG CTATAAAGGG TTGGTTCACC AGCCAGATAC 3 60 

TTCTAAGGCT CCAACCCTGA TTAACGGCTT GCAGGCTGTG CGCCAGTTGC ACTACCGCGT 42 0 

TGATTACAGA GACTGGTTTG ACAACGGCAG ATGAGATGCT TTATCCGTCA AATCTGATCT 48 0 

TGGTGGATGA CTTTGGTCAC CTACC 505 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 182 7 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



CTCTTTTAAC 


CGTTTTAGCG 


GTGACACCGA 


GGATTTTTTC 


AGGACCCAAG 


ACTTGTCGGG 


60 


CAACCGAAAC 


TGGGAGTTCG 


TCATCTCCAA 


T ATGC AG AC C 


AGCAGCATCA 


ACCGCAAGAC 


120 


AAACATCCAA 


CCGATCATCG 


ATTATCAAGG 


GGACCTGATA 


GGCATCTGTT 


ATTTCCTTGA 


180 


CTTGTTTTGC 


CAGTTGATAA 


TATTGATTGG 


TTGTGAGATT 


TTTTTCTCGC 


AATTGG AC T A 


240 


TGGTAACCCC 


TGAACGGCAG 


GCCGTCTCAA 


CTTTTGCAAG 


AAAGCTTTCC 


ACGGAATCTT 


300 


GATAGCGATT 


GGTTACCAGA 


TATAGTCTAA 


GCGCTTCTCT 


ATTCATAAAC 


CTCTCCTTTG 


360 


ATGGTATCTA 


GCCAATTTTC 


ATCTCTTCTT 


AGGAGCGAAA 


GCTGATTGAG 


TACTTGGTAA 


420 


CGAAATTCTT 


CCAATCCCAT 


TCCTTGAACA 


ACTATTTTCT 


CAGCAGCGAT 


ATTGAGATAA 


480 


GAGACTGCTA 


AGCAAGAACT 


TCAAAACCAG 


TCTTTCCTTG 


GCTGAGAAAA 


ACAGCTGTTA 


540 


AGGCTCCAAC 


CAAGTCTCCT 


GTCCCTGTTA 


TCCAGTCTAA 


TTCAGTACAG 


CCATTCTCAA 


600 


GTACAGCAAC 


TTGATTCTCC 


GAAACAATAA 


GGTCCTTGGG 


ACCTGTGACT 


AAGAATGACA 


660 


TACCACGATA 


GGTCTGACAC 


CAGTCTTTCA 


AG AC TTG AAG 


CAAATCCTCC 


GTTTCTTGAT 


720 


CTTTAGCACT 


CGCATCGACC 


CCAACGCCGT 


GATGCTTTAA 


TCCAACAAGA 


CTTCGAATTT 


780 


CTGACATGTT 


TCCTTTAAGG 


ACCGTAGGTC 


TATAGTCTAA 


AAGGTCTTTA 


ACTAAGCTCT 


840 


TACGAATGGA 


TGAAGTCGTT 


ACGCCAACCG 


CATCTACTAC 


CATCGGGAGA 


GAAGATTGGT 


900 


TTGCATACAA 


AGCTGCCATG 


CGGATTGCTT 


TTTCCTTCTC 


AGCTGACAAA 


TGCCCCAAAT 


960 


TGATGAAGAG 


AGCCTGGCTT 


TGCTTAGTAA 


AATCAAGAAC 


TTCACGGGGA 


TCATCTGCCA 


1020 


TGACAGGTTT 


GCATCCCAGA 


GCCAAAATCC 


CATTTGCCAG 


CATCTCACAA 


GAAATCTCAT 


1080 


TGGTCATACA 


GTGAATGAGG 


GAACTAGAGC 


C T AT AGG AAA 


AGGATTTGTC 


AATGCCTGCA 


1140 


TCATTCTATC 


CTTTCAGCAA 


AGAAATATCC 


TTGCACTTTT 


TTAAAGAATT 


CCTGCTTGAT 


1200 


TAAAAATCTA 


AATGCAATAA 


AGGAAATCGC 


TGTACCAATC 


AAGGTTGCTC 


CGAAAAATCG 


1260 


AGGCGTGTAG 


ATAAACCAAC 


TAAGCTTAGC 


AGCCGATCCT 


GTAAAGAGCA 


C C AT AAC AGG 


1320 


ATAGGAAACA 


ATAGAACCAA 


TAATACCTGT 


TCCCACAATT 


TCTCCCAAGG 


CAGAAAAGTA 


1380 


AAATTTTCGA 


CCGTACTTAT 


AAAAG AG AC C 


TGCTAGAAGG 


GCTCCAAAAG 


TCGCTCCTGT 


1440 


GAGAGATAAA 


GGAGCTTATC 


GGAATACCCT 


TGAGTCGTCA 


TACGGATAAA 


GGCTGTCACT 


1500 
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GTAGCCATAG CCAAGGCATA AACAGGTCCC ATCATGATTC CCGCTAGAAT ATTGAC T AC A 15 6 0 

CTGGACATCG GTGCCATTCC CTCAATCCGA AAGATAGGTG TAAGGACTAC ATCAAGGGCA 162 0 

ATCATCATAG ATAAAATGGT CAATTTGTGA ACTTGTAGTT GGTGCTTTCT CAAGTTTCTA 16 80 

TTCTTCTCCT TTTTCTAAAG ACTGTAAATC GCTCTTCCAT GTCTGGTGTT GGTAAGCCAT 174 0 

CTCCCAAAAC TTGGCTTCCA TATGAACACT GATGTGGAAG GCATCTAGCA TTTTTTGCTT 1800 

ATCTGTCTCA TCACTTTCTC GATAGAG 1827 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



CTATTGCCAA 


TCCATATAGC 


CTATCAGGTG 


GTCAATAACA 


ACGTGTGGCC 


ATCGCTCGTG 


60 


GCCTATCAAT 


G AATC C AG AC 


ATCATGCTCT 


TCGATGAACC 


AAATTCTGCC 


CTTGACCCTG 


120 


AGATGGTTGG 


AGAAGTAATT 


AACGTTATGA 


AGGAATTGGC 


TGAGCAAGGC 


ATGACCATGA 


180 


TTATCGTAAC 


CCATGAGATG 


GGATTTGCCC 


GCCAGGTTGC 


CAACCGCGTT 


ATCTTTACTG 


240 


CAGATGGCGA 


GTTCCTTGAA 


GACGGAACAC 


CTGACCAAAT 


CTTTGATAAC 


CCACAACACC 


300 


CTCGTCTGAA 


AGAGTTCTTA 


GATAAGGTCT 


TAAACGTCTA 


AACTCAAACT 


GCAAGGATTT 


360 


CCTTGCAGTT 


TTTCTACCTC 


GTATTGGAAT 


TTTTGATTTT 


TCGGAAAATT 


ATGTTAGAAT 


420 


TAAGTTTATG 


AAATGAGGTT 


TCCTCATACC 


TAGCAAGACT 


AGGAATAAAA 


ATAGAAATTA 


480 


GGTAG 












485 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1547 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

NTCTTGGGCN CNGGGCGNNT CCTTTGAGGA CNACGGTATC GATGACCTTG ATCTCAAGTG 60 

CAAGCAGTAT CTGAATCTGC AGCAGCACCT GTCCGTGCAA AAGTTCGTCC AACATACAGT 12 0 

ACAAACGCTT CAAGTTATCC AATTGGAGAA TGTACATGGG GAGTAAAAAC ATTGGCACCT 18 0 

TGGGCTGGAG ACTACTGGGG TAATGGAGCA CAGTGGGCTA CAAGTGCAGC AGCAGCAGGT 240 
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TTCCGTACAG 


GTTCAACACC 


TCAAGTTGGA 


GCAATTGCAT 


GTTGGAATGA 


TGGTGGATAT 


300 


GGTCACGTAG 


CGGTTGTTAC 


AGCTGTTGAA 


TCAACAACAC 


GTATCCAAGT 


ATCAGAATCA 


360 


AATTATGCAG 


GTAATCGTAC 


AATTGGAAAT 


CACCGTGGAT 


GGTTCAATCC 


AACAACAACT 


420 


TCTGAAGGTT 


TTGTTACATA 


TATTTATGCA 


GATTAATTTA 


CAGAGGGACT 


CGAATAGAGC 


480 


CCTCTTTTCA 


GGTTTTACCG 


TGACAATCCC 


TATTAAAAAT 


TATATCAAAA 


TCGTGAAAAT 


540 


ATTGGAAAAG 


TATGGTAGAA 


TGAAAATTGT 


CGTGTGAACG 


ATAATACTCA 


TTCTTGATGA 


600 


ATTGTGAAGC 


AGTTGCCCTT 


GGGTCGTTTT 


GCGAGTTGAA 


GTCAAGAAGA 


GGAAAAAAAC 


660 


AAAAAGGAGA 


AATACTCATC 


GAATTTCAAT 


GAAACAACTT 


CTTGAGGCTG 


GTGTACACTT 


720 


TGGTCACCAA 


ACTCGTCGCT 


GGAATCCTAA 


GATGGCTAAG 


TACATCTTTA 


CTGAACGTAA 


780 


CGGAATCCAC 


GTTATCGACT 


TGCAACAAAC 


TGTAAAATAC 


GCTGACCAAG 


CATACGACTT 


840 


CATGCGTGAT 


GCAGCAGCTA 


ACGATGCAGT 


TGTATTGTTC 


GTTGGTACTA 


AGAAACAAGC 


900 


AGCTGATGCA 


GTTGCTGAAG 


AAGCAGTACG 


TTCAGGTCAA 


TACTTCATCA 


ACCACCGTTG 


960 


GTTGGGTGGA 


ACTCTTACAA 


ACTGGGGAAC 


AATCCAAAAA 


CGTATCGCTC 


GTTTGAAAGA 


1020 


AATTAAACGT 


ATGGAAGAAG 


ATGGAACTTT 


CGAAGTTCTT 


CCTAAGAAAG 


AAGTTGCACT 


1080 


TCTTAACAAA 


CAACGTGCGC 


GTCTTGAAAA 


ATTCTTGGGC 


GGTATCGAAG 


ATATGCCTCG 


1140 


TATCCCAGAT 


GTGATGTACG 


TAGTTGACCC 


ACATAAAGAG 


CAAATCGCTG 


TTAAAGAAGC 


1200 


TAAAAAATTG 


GGAATCCCAG 


TTGTAGCGAT 


GGTTGACACC 


AATACTGATC 


CAGATGATAT 


1260 


CGATGTAATC 


ATCCCAGCTA 


ACGATGACGC 


TATCCGTGCT 


GTTAAATTGA 


TCACAGCTAA 


1320 


ATTGGCTGAC 


GCTATTATCG 


AAGGACGTCA 


AGGTGAGGAT 


GCAGTAGCAG 


TTGAAGCAGA 


1380 


ATTTGCAGCT 


CCAGAAACTC 


AAGCAGATTC 


AATTGAAGAA 


ATCGTTGAAG 


TTGTAGAAGG 


1440 


TGACAACGCT 


TAATTTATAC 


AAATAGTAAT 


TACCTAGGAG 


GGCGGGGCTT 


AGCCCGGCTC 


1500 


TCCTATTTTC 


AAAAAATATA 


GGAGAATTAA 


AATGGCAGAA 


ATTACAG 




1547 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 740 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 



CTATAAAAAA 


AAGGGTAACC 


AGTATGGAGG 


ATGAATGTCT 


GGAACTATCT 


GAGAATCTCG 


60 


GATTTTGGAA 


ATCAGACCGA 


TCATCATGAG 


ATAAGGAAGG 


AAAGCACTTG 


TAAAAAGCAC 


120 


TGTAACCACG 


CCAGTCCCCT 


GTCCCAAGAG 


GGTGAGGTGG 


TAGCGTAAAA 


CCATGCGGAA 


180 


AAATCCCTTT 


TTAGTGGTTG 


AAATTCTCTC 


CTTGCTGCGA 


CGTTCTTTTT 


TGACCTTCTC 


240 


CTCACTATTA 


AGCAGGATCA 


CGTCATAAAA 


ACGAGGAAGG 


ACCTTCTTTT 


TGGTCAGATA 


300 


AAGCAGGAAG 


AGAGTTAGTC 


CTATCCAAGC 


GAGCAGACCC 


AATATGGCTT 


CTATTGAAAA 


360 


AGGCTCCACT 


GCTATTTTGT 


AAAAGATATG 


AAGAGGATAA 


AGGAGAAATG 


GAATGTCTCT 


420 


AACTTTGTCA 


ACAATACTTC 


CAAAAGTCGA 


CTGAAGAAAG 


AAGATAAATA 


TTAAAGGTAT 


480 
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GAGAACTCCT ATCCCAATCA TCACATTCGA AAAAATAGAC TGATACTTTC TGAAGACCCT 54 0 

AGTCTGAGCC AAGAAATGTA CTGCCACTAC CGTCACTAAA GTAACAGAGA CAAATAATAA 600 

GGTCAAGGAC AGTAGCATCA AAGGCAAACC CAGCCAAAGA GAAGGAGCTA G AC T AATAT A 660 

GAGGGCTAGA AAATAAGCTA GGATTGGTAC AATTCCAGTT AGAGCTGGCA AGAGGACAGA 72 0 

CAGTCCTTTA GCAATTCGAT 740 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 1 : 



ATCGAATTCG 


TTTTGCAAGT 


GGCGAAATGC 


GAACCACGTT 


TGTGTCTTTA 


TAAGTTTCCA 


60 


CGTCTTCTTT 


GTGGACACGA 


CCGTTTGCAC 


CTGAGCCAGA 


AACGTCGTAG 


AGGTTTATCC 


120 


CTAAATCATC 


CGCTAACTTT 


CTAGCTGCAG 


GAGTCGCTCT 


TAGCTTGTCA 


TCAGCCATGA 


180 


CCTCTCCAAT 


TCTATTTATG 


ATACAAAGGG 


CGTCAAAAGC 


GACTGAAAAA 


TAGGAAATCG 


240 


ACGATGGCTT 


CGATGAAGCC 


AAGGAGATTT 


ATCTTTTTTT 


CCAAGCTTTT 


AGCCCGTGCT 


300 


CTAATCTAAG 


ATATTAAGGA 


CGAAGAGCTC 


TGC AC C T AAA 


AGATACAAAG 


TTCTCGTCAG 


360 


CTTTGTTTTA 


TTTACATAAC 


TTATCTTATG 


TAACTCTATT 


CTTTGTTATA 


AGTTTTTCGG 


420 


ATTGCATCTT 


TGATACTTTC 


AACTGTTGGA 


ATCATTGCAC 


ATTTTTAGGT 


TTTGCGCATA 


480 


AGGCATCGGC 


ACATCTTCTC 


CTGCACAACG 


GCGGATTGGT 


GCATCTAGAT 


AGTCAAATGC 


540 


TTCTGATTCT 


GAAATAATAG 


CTGAAATTTC 


ACCGATATAG 


CCACTTGTTT 


TGTGGGCATC 


600 


GTTGACCAGA 


ACAACCTTAC 


CAGTCTTCTT 


CACTGAGTTT 


ATGATGATAT 


CCTTATCAAG 


660 


CGGAACAAGG 


GTACGTGGGT 


CAACAATTTC 


AACTGAAATT 


CCTTCTTCAG 


CTAATTCTTC 


720 


AGCAGCTTGA 


ACCACACGGC 


GAAGCATTTT 


TCCATAAGTG 


ACAACTGTTA 


CATCCGTTCC 


780 


TTGGCGTTTG 


ATTTCACCAA 


CCCCAAGTGG 


AATTGTGTAG 


TCTGGATCAA 


CTGGCACTTC 


840 


CCCTTTTTGG 


TTAAATTCTG 


ACTTGTACTC 


AAGTATAATA 


ACTGGGTTGT 


TATCACGGAT 


900 


AGAAGACTTA 


AGCAGGCCTT 


TCATGTCCGC 


AGGTGTTCCA 


GGTGCCACAA 


CCTTAAGCCC 


960 


TGGAATGTGA 


GTAAACCAAG 


ACTCTAGAGA 


TTGTGAGTGC 


TGGGCGGCAG 


AGCCAACTCC 


1020 


GTTACCAGCT 


GCACAACGAA 


CAGTCATTGG 


AACCTGACCT 


TTACCACCAA 


ACATGTAACG 


1080 


TGTTTTAGCA 


GCTTGGTTGA 


CGATATTGTC 


CATGGCAATA 


ACAGAGAAGT 


CCATGAAGGT 


1140 


CATATCGACG 


ATTGGACGAA 


GTCCTGTCAT 


GGCTGCTCCT 


GCTGCAGCTC 


CAGAGATGGC 


1200 


AGCTTCAGAA 


ATCGGACAGT 


CACGGACACG 


TTCTGGACCA 


AATTCTTCAA 


GCATTCCAAC 


1260 


AGAAGTACCG 


AAGTCTCCTC 


CGAAGACACC 


GACGTCTTCT 


CCCATCAAGA 


ACACATTTTC 


1320 


ATCGCGAACG 


CATTTCCTCA 


GACATAGCAA 


GGATAATGGT 


GTCACGGAAG 


GACATTGTTT 


1380 


TTGTTTCCAT 


TTTATCTCTT 


TCTCCTTAGT 


CTGCGTAAAT 


ATCTTCAAAG 


GCTGATTCAA 


1440 


GCGGTGGGAA 


TGGGCTTTCC 


TCTGCAAATT 


TAACAGAAGC 


TTCTACTGCT 


TCCTTTACTT 


1500 


GCGCTTGGAT 


TTCTTCCAAT 


TCTTCGGCAC 


TTGCAATGTT 


ATTTTCAATA 


AGGTAATTGC 


1560 
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GGAGGTTTTC 


GATTGGATCT 


TTTTGTTTCC 


ACAATTCCAC 


TTCTTCACGC 


GTACGATATT 


1620 


TACCAGGGTC 


AGATGATGAG 


TGACCGAGCC 


AGCGATAAGT 


TACACTTTCA 


ATCAAGACTG 


1680 


GACCATTGCC 


ACTGCGAACA 


TGGTCTATAG 


CTTTCTGAAA 


TCCTTCATAG 


ACATCGATGA 


1740 


CATTGTTACC 


GTCTTCGATG 


AACATTCCAG 


G AATTC CAT A 


AGCGGCGCTA 


CGTTGATGGA 


1800 


TATGTTCTAT 


ATTGGTCATT 


TTCTTGATAT 


CCGCAGAGAT 


ACCGTAACCG 


TTGTTAATGC 


1860 


AATAGAAAAT 


GACTGGCAGG 


TTCCAGATAG 


AAGCCATGTT 


CACTGCTTCG 


TGGAAAACAC 


1920 


CTTCATTGGT 


CGCACCATCT 


CCAAAGAAGC 


AGACAACGAT 


TTTACCGGTA 


TTTTGCATTT 


1980 


GCTGACTGAG 


GGCTGCACCG 


ACAGCGATCC 


CCATACCACC 


ACCTACGATA 


CCATTGGCAC 


2040 


CAAGGTTCCC 


AGCATCAAGG 


TCAGCGATAT 


GCATAGATCC 


ACCTTTCCCT 


TTACAGGTTC 


2100 


CAGTGTATTT 


ACCAAGGATT 


TCAGCCATCA 


TTCCGTTGAA 


GTCAATCCCT 


TTAGCAATAG 


2160 


CTTGCCCGTG 


TCCACGGTGG 


TTTGAGGTAA 


TCAGATCATC 


TGGATTGAGA 


GCTACATAG 


2219 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1078 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 



CTAACCCTNG 


ACGGGGCCGC 


TATCATCAGT 


CAAACAGCTA 


AAAATCTTGT 


CTGCAAAAGT 


60 


CTCGATTAAC 


TGAGCTTTTA 


CAAAAGCCGT 


ATTTCCTGGA 


ATAACTTGGA 


GATTGATCAT 


120 


CTTATCCATC 


AATTCAGCCG 


ATTCGATATT 


GTCTTCAGCC 


AGTTGCAGAC 


TTTTTACGAT 


180 


TGATTTTGGC 


AATTC GTAGA 


CATAGGTGTT 


GTCTCTCAAA 


GGAATTTTGA 


CAATACCTAA 


240 


CTCTTTGATA 


TCTCGGGATA 


CCGTCGCCTG 


AGTGGCAGTG 


ATACCTGCTT 


CTTTCAAATG 


300 


TTCTACAATT 


TCTTCTTGCG 


TGC CGATTTG 


ATAATCTGTC 


ACCAATCTTC 


TAATTTTTTC 


360 


AAGTCTCTCT 


TTTTTATTCA 


TTTTTAAATT 


GACTATGCGC 


CCTCTCTACT 


GCTTCTTTAA 


420 


TCTCAGCAAG 


AATCTGATTG 


CTTGCTGACT 


TTTCTTTTTT 


CAAATACACT 


AAAAATTCAA 


480 


TATTTCCATG 


TCCACCTTGG 


ATGGGAGAAA 


AGTCCAAGCC 


AAGGACTGAA 


AAACCTGCCT 


540 


CTACTGCCAT 


AGCTGTTACA 


GATTCAAGGA 


CATTCTGATG 


AATCTTAGCA 


TCTCGAATAA 


600 


TTCCATTTTT 


CCCAATCTGC 


TCACGTCCTG 


CCTCAAACTG 


AGGTTTGACA 


AGTGCTACCA 


660 


CCTGACCTTG 


ATCAGCCAAG 


ACACGGTGCA 


AGGCTGGCAA 


AATCAGACTA 


AGGGAAATGA 


720 


AACTCACATC 


AATACTGGCA 


AAGCTCGGCT 


CCTGCTCGAA 


ATCAGTCTTT 


TCAGCATAGC 


780 


GGAAATTGAA 


CTGCTCCATG 


CTGACAACTC 


GTGGGTCTTG 


GCGTAATTTC 


CAAGCCAACT 


840 


GATTGGTACC 


AACATCGACT 


GCAAAGACCA 


ACTTGGCACT 


ATTCTGTAGC 


ATGACATCGG 


900 


TAAAACCTCC 


AGTAGAGGCC 


CCGATATCAA 


TCGTAGTCGC 


GCCATCCACC 


GACAAATCAA 


960 


AGACCTGCAA 


GGCCCTTTTC 


CAGTTTCAAA 


CCACCACGGC 


TGACATACTT 


GAGTTTCTCC 


1020 


CCCTTGAGTT 


TTAATTCGGT 


GTCATCTGGA 


ATTTCTCTCC 


TGGCTTGTCA 


AACCGTTC 


1078 



(2) INFORMATION FOR SEQ ID NO: 23: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 92 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 



ACTTTCCTGA 


CCTCTGTTTC 


CAAATAATCT 


TCCAAATGGA 


CAGAGATCTA 


CCGTTGTTTG 


60 


CATCGATAGC 


TGAGGTCTTT 


TTTAGAAAAT 


ACCATCACTT 


TTAGAAAATA 


TAAACACATT 


120 


TTTCGGATAA 


GATTAAGGTT 


AAAAGCAGCT 


CGTTTATCCA 


GGGTCTGATG 


ATGGTCTTCA 


180 


CGATAAACCA 


CATCCAATAA 


C C AATGC AT A 


CTTTCTGCTG 


AC C AATG AC C 


TCGAACACTA 


240 


TGGCAAAAGG 


TCATCAACAT 


CAAGCTTAAA 


GTTAAAGATA 


AAATAGCGAA 


CGTCTTGACT 


300 


TGTAATACCA 


TCTCTATCAA 


TAGT ATT AC G 


AGTCATTCCA 


ATTCCACGCA 


ATTTATGCCA 


360 


TTTGGGATGG 


TTTTGACACA 


ACCACTTAAC 


ATCAGAAGAC 


ACCCAGTATT 


CTCGAACTTC 


420 


AATCTATCCT 


CTTTCTATAT 


TCTAACTGAA 


AGGACAATTC 


AATGATTCAT 


TTAATAATGA 


480 


TTAGCGCCAT 


TGCTCTAGCC 


ATTGGAATTG 


GTTACCGCAC 


CAAAATCAAT 


ATTGGCCTGC 


540 


TGGCTATTGC 


TTTTTCTTAC 


CTCATCGCAA 


CCACTCTCAT 


GGGATTAAGT 


CCCAAAGAAC 


600 


TTCTTCATTT 


TTGGCCAACC 


TCACTCTTTT 


TTACCATTTT 


TAGCGTCTCT 


CTCTTTTATA 


660 


ACGTTGCAAC 


AACTAACGGT 


ACTCTTGATG 


TTTTGGCTCA 


ACACATTCTC 


TACCGCACAC 


720 


GCACCCACCC 


TAACGCCCTC 


TACATGATTT 


TATACCTGAT 


GGCAACCCTT 


TTGTCTGCTT 


780 


TAGGTGCTGG 


ATTTTTCACT 


ACTATGGCCG 


TTTGCTGTCC 


TCTAGCGATT 


ACCCTCTGTC 


840 


AAAAAGCGGA 


CAAACACCCT 


TTGATTGGAG 


TCAAAGCGTC 


AATGGGAACT 


TCAGGAAGGG 


900 


TAATTTGATA 


ACCAAAGGAA 


TAAAATTT 








928 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 847 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



AAAAACGCAC 


CATATCAAAA 


ACTAAAAAGT 


TTGATATCAT 


GCGTCATGTC 


TTAAACTAAT 


60 


TGACTATACT 


TTCTATTCAA 


ATGAGCTTTT 


AACCAATTGA 


TTGAGCCAAT 


CCACTCTTAA 


120 


AACCAAAGGA 


GCAATTTCTC 


GGCTTAGCTG 


ACTCTTCTCG 


GAATCTGAAC 


CATGTACAAC 


180 


ATTTTGGATA 


ATCTCATTTT 


CTCCAGCAGC 


TTTTGCAAAA 


TCACCTCGAA 


TAGTGCCTGG 


240 


TAAAGCTTCT 


TCTGGACGAG 


TTGCACCCAT 


CATGGTCCGC 


CAAGTTTCGA 


TTACTTTGGG 


300 
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A c*c* ara a a mp 

J\^\^I\KjPiJ\±\ 1 \J 


ALALLLALAA 




I CjAAGTC atg 


AATTCACGAA 


TCGGTGGGTA 


3 60 


A A A A PTPTf A 


\— V-Z-iA^VwAAVj 1 


1 VjjA 1 Avj X <j 


p< mpi pi m/~i 7v a Ti/" 1 

L. 1 UG 1 CAA 1 C 


AACTCTTCTG 


AAAACCTGTG 


42 0 


A APn a a arTP 


A A r pfpT"T , T l PP 
V— x-li-il 1X1 l^_kj 


A TTP T 1 A A a iPP 
Alibi /Wi, 1 L. 


pi tv ppmmpfTimp 


GATGCGCTTT 


*7v tv /-i tv /^i mm ^ 7v r~\ 

AACACTTCAC 


/on 
48 0 


rrar r ranprr 


1^11111 -f-i 


L- 1^-1 \ab 1 i 


1 1 LtA 1 AAA 


GAATGTTTGT 


TCCATACCCG 


c /i p 
b4 0 




r 1 app r r r PP f T ir r r P 

^rTL.V3Vw 11^111 


/~<mrprprp t\ rnrnrnrp 
LliilAiiii 


TV P 1 P* AP7\ rnprnp 


/"I m /■*»/""» TV TV 7V TV TV m 

GTGGAAAAAT 


f% fy tv /"■■» tv tv tv /~i mm 

GGAGAAAGTT 


600 


TTCAGAAGAG 


AGAATGAGAG 


AACCCTCGGG 


TTCTCTCATT 


CTCTCTTATT 


CTACTGTTTC 


660 


TTCCACAGTG 


TCAACGGCAG 


TATCCACAAC 


TACTTCTGTT 


GTTTCTTCAT 


TTCCTTCTTC 


720 


CTCTACTGGA 


GGATTAAGGT 


ATTCTTCTTC 


GTTGACAGCA 


TGTGGTTCAA 


GGTTACGGTA 


780 


ACGGGCCATA 


CCAGTACCAG 


CTGGGATGAT 


CTTACCGATG 


AATAACATTT 


TCCTTTAAAT 


840 


TCCAAGG 












847 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



ACAACCTAAC 


TACCGNCTAA 


TTCAGCGCGA 


ACTTCTGCAG 


TAGCTGCTTC 


AACAACTTCA 


60 


CGACGTGAAA 


GGATGAAGCG 


GTTTTCTTTA 


GCGTTAACTT 


CTTTGATTTT 


AGTATCAAAT 


120 


TCTTGACCTA 


CAAAACGCTC 


AGCGTTACGT 


AC G AAAC GAG 


TATCCAACAT 


TGAAGCTGGG 


180 


ATAAATCCAC 


GAACACCTTC 


AAATTCTACT 


GAAAGTCCAC 


CTTTAACGGC 


ACGCGTTCCT 


240 


TTAACAGTAA 


CAACTTCTTC 


TTCGCGACCA 


ACAAGTTTGT 


CCCATGCTTT 


GCGAGCTTCA 


300 


AGGCGTTTTT 


TAGATGACAA 


GGTATGTAAC 


TGTATCAGTA 


TCTTTACCAA 


CTACTTGACG 


360 


AAGTACAAGA 


ACATCCAATA 


CTTCTCCTAC 


TTTAACAAAG 


TCATTGATAT 


CTGCATCACG 


420 


ATCGTTTGTC 


AATTCGCGAA 


GAGTCAAGAC 


ACCCTTCAAC 


ACCAGTTCCC 


AGAAGAATGC 


480 


AACGTTAGCT 


TGAGTCGCAT 


CAACTGTCAA 


TACTTCAGCA 


CTAACACATC 


ACCAGTCTCA 


540 


ACTTGACTNA 


CGCTATTGAG 


CANATCTTCA 


AATTCGAT 






578 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 888 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
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br I AGTTATAb 


m a P'P'P^ptm/— »/-i/-i 
X AbjbjbjbrTCGG 


■a mmyi t\ a t\ m/~i /^i 

ATTGAAATGC 


CACNGCGCTT 


CTTGGAGTTT 


CTGATACCGT 


60 


1 XAAAAXAbrb 


pmmpppp 7\ mm 
br X X brbrbibA X X 


b, I Gbx 1 1 biGGA 


br TCAGAGCCT 


m tv rnn tv tv /-i s~*/^t s~\ 

TATCAAGCGC 


AATCATGATA 


12 0 


pi p< rp rp pi pi rp rp 0 pi 
brbr I 1 brbj 1 1 brbj 


rn a T 1 A f" 1 Ti A P" 1 m rn 
1 A 1 Abj 1 Abj 1 1 


br X b, X AbibiA X A 


A P* 0 m/'i/Tnm/'^ m 

ACC X GGTTi CT 


mo /~i m /-1 /—1 mm 7v /~t 

TGGTCGTTAG 


/irt> /1 /-1 m /-i m /"i 

GCACCTGGTG 


180 


b7AbjbjAAbjbj 1 I 


pi rpp« A f" 1 P 1 A A rprp 
bj 1 bAbrbAA I 1 


CTCCTTTTTG 


ACGAAATTCT 


TCAGCGTTGT 


CTGTCGCCAG 


240 


rp 7\ tv pi rn a rnrprprp 
1 AAL 1A1 111 


rp pi P 1 rp pi rp rp rp rp rp 
ILL Ibil 1 X I 1 


X bj Abr X X X bi X bj 


TCGGTTTTTC 


m/"i 7v tv /"•tmm/'iTv m 

TGAAGTTCAT 


TTTCAACACG 


3 00 




rpp 1 a rrnpppprn 
X LAb 1 brbibb. X 


pi pi mpi rnrnrnp' a pi 
bib X br X X X biAC 


GCGGTCGCGC 


TCAGCCTTAT 


CCTTATAGTA 


3 60 


bibi X bx X LLAAL 


AAA rppi A f~* A A A 

AAA i b, AbrAAA 


pi 7\ mmmpp a a a 

brA XXX Gb. AAA 


7V f~°* m /""i m /^t /~i /~i 

AGGCTCTCCC 


ACCTGATTTG 


CAAAAGGAAC 


42 0 


rp/~i/^i tv pirppi j\ i\p 
1 bxbiAb X bAAb 


pi A APfTPTlPAP 
biAAbj X b, X b-Abr 


mp a APP^mpp 

X b. AALiL, A X brbj 


CTTGGTTTCC 


TGATTGAAAA 


AATTTCGGAA 


480 


a r^r^c^r* a a a p , rp 
Abrb. brbj AAA br 1 


1111 b Ab, 1 AA 


pp t\ pirn a m/^i/^im 

C b, Abr X AT. CCT 


TTCCAATTCA 


TTTGCCGTAT 


CGCGTCCCAG 


540 


ACCTTGAAAG 


AGGCTTTGAA 


GATTTTTTGC 


TGTTAGTTCT 


TGGGTTTGCA 


GGATTTCAAA 


600 


GAGCTTTTCA 


TCCTTGATAG 


TAAAAGGATT 


GAGAGATTCT 


GTACTTGGCG 


GAGCGATATA 


660 


GGTCGATCCT 


GGAAGTAAGG 


TGCGGTAGCT 


ATTTTGTGAA 


AAGCCGACGT 


GTTTGATAAC 


720 


TTCGAGGATT 


TTATG AC TGC 


TTTTATCCGA 


CCAGTTAGAA 


T ATT AC TGTG 


TTTCCCCATA 


780 


ATTTCGATAA 


TCAAGGTAGC 


CTGGATATGG 


TCTCCAATCT 


CGTTTTTATT 


GGAAACTGTA 


840 


ATTTCCACAA 


TACGGTCATT 


TTCCACTTGC 


TCAATCGACT 


CAATCAGG 




888 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



ATCGAATTTT 


GTTCTTTCAT 


AGAGAGCTAC 


CTGAGTTCTA 


TTCAAGCTCA 


GGTAGTACTT 


60 


TCTTATAAAC 


TAGACAAACT 


AACTGTCATT 


CTACCATCAG 


ATTACAAGAC 


ATCATCGTCA 


120 


CTCACCTTGG 


AATTCAATGT 


CGTACCCCAA 


TGGGTAATTT 


TACGGTGGGG 


TTGAGCTAAA 


180 


ATTGGTCTGT 


TTTCATAGAT 


TGTTTGCCAT 


CTATTCCATA 


GTAGGCCCGT 


CTTTTTCTCA 


240 


ATCTTAACTC 


GCAGATTTCT 


CATATTTTCT 


TTGATTGGGA 


GGTTGAGGAC 


AAAACCTGCA 


300 


GTCTGGTTGC 


GACCGTTTCC 


TTCCCAAGAA 


TGACTACGAA 


CAACTTGGTT 


TCCATCTTTA 


360 


TCTACTGGAA 


CTTCTTCCCA 


AGTTATGGAG 


TAGCGGGCAA 


TGTAAGCTCC 


ACTGTGTTGA 


420 


ATTATCAATG 


TTTTATCTTT 


CACAGGGAGT 


CTGACTGATT 


GGTTGAACTG 


GCTTAGAAAC 


480 


TTGTGTCGCC 


GTTTCAGCAT 


TCGTAGCTAT 


AAA 






513 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 214 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

ATCGAATTCT AACATGTGCT TCTCCTTCTA TTGTTCCTAT CTTTAAAATC TACTCCTTCA 6 0 

TGCTCCAAGA GCCAAGCTTT CTTTTCCACT CCTGCAGCAT AACCTGTCAG ACGCTTGCCT 12 0 

GCTCCCAACA CACGATGACA AGGTACTAGG ATAGACCAAG GATTGCGTCC CACTGCTCCA 18 0 

CCAATTGCTT GAGCAGAAGC CACTTGCAGG TCTT 214 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1084 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



CTCCAGCAAT 


GGATCCAAGT 


ATGATGGGCG 


GGATGATGTA 


AGCTTTCTAT 


AGAAAACACC 


60 


TTATAAAAAA 


CACGAAAGGA 


GGGAATGACT 


AACCCTTCTT 


TTTATAATAT 


TCACTTCTAA 


120 


GATTGATGGT 


GAGCTCTCCT 


AACTTATATG 


ATAAAATAAG 


AC T AG AGGAA 


AGGAGAAGAA 


180 


CATGATCGAT 


GTACAAGAAA 


TTCTGTGCAA 


GATGACCCCC 


AATCAGAAGA 


TTAATTATGA 


240 


CCGTGTCATG 


CAGAAAATGG 


TACAAGCATG 


GGAAAAAAAT 


GAGTAGCGGC 


CAACCATTCT 


300 


CGTGCATGTT 


TGCTGTGCCC 


CTTGTAGTAC 


C TAT AC ACTA 


GAATATTTGA 


CCAAGTATGC 


360 


AGATGTGACC 


ATCTATTTTG 


CCAATTCTAA 


TATCCATCCC 


AAGGCAGAAT 


ACCATAAGCG 


420 


GGTCTATGTC 


AC C AAGAAAT 


TTGTTAGTGA 


TTTTAATGAG 


CAGACAGGAA 


ATACGGTTCA 


480 


GTACCTAGAA 


GCTCCCTACG 


AACCCAATTA 


ATACCGAAAA 


CTAGTTAGGG 


GGCTAGAGGA 


540 


GGAGCCCGAA 


GGTGGCGACC 


GTTGCAAGGT 


TTGTTTTGAC 


TACCGACTGG 


ATAAAACAGC 


600 


GCAAGTGGCT 


ATGGACTTGG 


GCTTTGACTA 


CTTTGGTTCA 


GCCTTGACCA 


TCAGTCCTCA 


660 


TAAGAATTCT 


CAAACTATCA 


ATAGCATCGG 


AATCGATGTG 


CAAAAAATTT 


ACACGCCCCA 


720 


CTATCTTCCC 


AACGATTTCA 


AGAAAAATCA 


AGGCTACAAA 


CGTTCAGTAG 


AGATGCGTGA 


780 


GGAGTATGAT 


ATCTATCGTC 


AATGTTATTG 


TGGCTGCGTC 


TATGCAGCCC 


AAGCCCAGAA 


840 


TATTGACCTG 


GTTTAAGTTG 


AGTAGGACGC 


CACAGCATGC 


TTGCTGGATA 


AGGATGTTGA 


900 


G AAAG AC TAT 


TCTCATATCA 


CATTTATAGT 


AGATTGAAAC 


TAGAATAGTA 


CACCTTTACT 


960 


TCTCAAACAT 


TGTTAGAAAT 


CGATTCGGCT 


GTCCTTATTT 


CATTTTAATA 


TACTGGTACG 


1020 


AAATTAGATA 


TATCAATGAT 


AACTTGCCTC 


AAGGTAGGTT 


TTTTGATAGT 


AG AAAAGC G A 


1080 


TAGA 












1084 



(2) INFORMATION FOR SEQ ID NO: 30: 
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(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 112 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE . TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 



ATCGAATTCA 


TTGACTGCCT 


GAAAAGACTT 


CAACTCGTCT 


GCCTGATAAC 


CGAAAGACTT 


60 


GGTTACTTTG 


ATACCTGATA 


CGGACTCCTG 


TACCTTGTTA 


TTGAGTTCAG 


AAAAAGCAGC 


120 


TTGGGATTCG 


CCAAAGGCCT 


TATGAGTCTT 


TCTCCCTAGG 


CGACTAGTCG 


TATAGGCCAT 


180 


GAAAGGTAGG 


GGGAGAATGG 


CAACAAGAGT 


CATCTGCCAT 


GAGATGCTAA 


AGAGCATGGT 


240 


CAACAAAGTC 


ACCAGAGCCG 


TGATAGAGGC 


ATCCACCGCA 


GACATGACAC 


CGCCACCTGC 


300 


TAAACGAGTC 


AAGGAATTGA 


TATCATTGGT 


TGCGTGTGCC 


ATCAGATCAC 


CCGTCCGATA 


360 


GGTTTGATAA 


AAGGCTGACG 


ACATTTTTGT 


GAAATGCTTA 


AACAAGCGAG 


ACCGCATGAT 


420 


CTGTCCCAAG 


CAATAAGAGG 


TCCCAAGGAT 


AT AC AT AC GC 


CACACATAGC 


GCAAATAGTA 


480 


CATACCAAAG 


GCTGCAAGTA 


GCAAGTAAAA 


TAGGCTAAGA 


AGGAGGTCCT 


GCTGGGTTAA 


540 


TTGCCCCGAT 


GTGATGGCAT 


CAATAACCCG 


CCCCATAACC 


ATAGGAGGAA 


TGAGATTGAG 


600 


GACGGAAACC 


AAGACCAGGG 


CCACAATCCC 


G AC T AG AT AA 


CGGCGTTTTT 


CTAACTTGAA 


660 


AAACCACCAA 


AATTTTTGAA 


TAATGGACAT 


AAAATCCCTT 


TCTGGATTGC 


AAATAGAAAC 


720 


CTGAGGCCAA 


TACTCAATGG 


AAAATCAAAG 


AGCAAACTAG 


GAAACTAGCC 


GCAGGCTGCT 


780 


CAAAGCACTG 


CTTTGAGGTT 


GTAGATAGAA 


CTGACGAAGT 


CAGTAACCTA 


CATACGGCAA 


840 


GGCGACGTTG 


ACGCCGTTTG 


AAGAAATTTC 


CGAAGAATAC 


AAGACCCCAG 


GTTTTTCTTA 


900 


TTTATAAGTT 


ACCACTGTAA 


CAGCACCCTT 


GTCATATTCA 


GCAATAAAGA 


TATTGGCTAC 


960 


ATTGTCATGC 


CCTTGTTTAC 


TGAGGTTATC 


AAGCAACCAC 


TCCTCGCTAC 


GAACAATCGA 


1020 


TCCCAAGACA 


TCTACTTGAA 


TCACACCGTC 


AGTCACAACT 


GGATAC TTAG 


GATTTTCATC 


1080 


TCCCATTTGC 


ACAACGATGA 


GTTGCCCATT 


TTGCTCTTGC 


ACAG 




1124 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TTACCTTCAT TGCAGCCATT ATTGGTTCTT GTGTCAGCCA GATTTTAAGT ATTCTTTATA 6 0 
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AbAb AC CTbb 


Ibl bb 1 LTTT 


7> m p m m p p p p a 

ATCTTGGbbA 


TTTTGGCACC 


GCTGGTTCCA 


GGTTATCTCT 


12 0 


bb XAb bbAAb 


AAblbbbl X X 


m nn rn p mp a^ap 
1 1 1 b 1 bALAb 


PP TV pm a m 71 a 

GGGACTATAA 


TAAAGCACTG 


GCAAGTGCGA 


1 O Pi 

180 


PnTTPr 1 rr-irpp"' rp 
LLll bb 1 Ibl 


b/\ X br X 1 bbb X 


mmppm a a mprn 
X X bb 1 AA Ibl 


pm a rnrppp A Am 
b X A 1 X bbAA X 


PPPTA p/-tp/-i 71 

bbb 1 AbbGGA 


a pi a pmn a mmp 

AbAGTGATTb 


P yl P 

Z 4 U 


rppi Apj /-«rppirp 7\ 
X b AbAb IblA 


mp a rprp a rp A rp A 
XbAX XAXA1A 


AAAAPAPA mp 

AAAAb Ab A X b 


P a P m 7\ T"IPP rn 7\ 

bAbl AlbblA 


p a pmmm a p a p 

bAb 1 I 1 ACAG 


AAA m A A A A P A 

AAATAAAAGA 




A rprnmrp p mp a a 

1 i It 1 b AA 


A A A rpp A P A rp a 
AAA X b Ab A 1 A 


A A rp A A A rprp a A 

AAX AAAX X AA 


pi a A ppipmmmp 
bAAbbb X X X b 


m a m a mp mp pi pi 
X A I A X GTGCG 


A /■*» A A m A /~1PP P 

AG AAT AC C GC 


*3 £T Pi 

j bU 


Af rnm A mp Ajf 
.rib 1 inl b-tt-rlb 


-M-rt-Z-i 1 1 bbbbb 


rpp A T"7i rp r PP ,, P" ,r P 
1 bA 1111 bib 1 


AfPPAPPA A A 

A X b b Ab bAAA 


pp a a p mm a a m 
bbAAb X X AA I 


<^~i /^i m /~i /~i a p /~i 

b b GTCGGAGb 


A O Pi 


LAA 1 bbb 1 lb 


A TlPrPTi AP rpp rp 
AAb 1 AAb lb I 


mprprnpA A AP^P 
Ibl 1 b AAAb X 


P" 1 P 1 rnp mm a PP A 
bblbl XAbbA 


mmmp a a a a a p 
1X1 LAAAAAb 


TCATCTTAGT 


/OP 


/-'/-n-pf-' 7\p7\ a rp a 
bit 1 bAb/VA 1 /i 


bbb 1 br/\ 1 Ibl 


PP a rpp p a A P^ A 
bbA X bbAAb A 


P A PP m A A A A A 

bAbrb X AAAAA 


mpA A rnppppp 

i bAA rbbbCC 


m A A A A A A P A A 

TAAAAAACAA 


r / n 

b4(J 


mm jnnri a A rnp 
1 1 Abb bAA 1 b 


A mmpmppm a a 
A X X b 1 bb X AA 


A A A P A A A rprprp 

AAAb AAAX X X 


P 1 A pppm a mp a 
bAbbb 1 ATGA 


A P"«P^/TnP"« A PP P 

AGGCTCAGGC 


p a rnmpm/~t a p a 

GATTGTCACA 


/T A A 

600 


1 b AAbbbA 


P** A A rprppii \ ii i irpp 
bAA 1 Ibl 1 1L 


X X lbbAlAlb 


PPiTipmP A A pm 
bb Ibl bAAb X 


a mmpmp a mp a 

AT Tb 1 b ATGA 


mAmPA a /'"immp 

TATGAAGTTG 


a a r\ 
obO 


rnmr 1 A A A A rpp A 
± 1 b AAAA 1 bA 


prnpPP APA A A 
br X bbbAbAAA 


X A 1 bbbAbAA 


PPTIPP AAAAA 

bb 1 bbAAAAA 


TbTTGGCTGA 


TAGTGGTTAT 


72 0 


P"*A 

bAAbbbbbbA 


mp a A P 1 A rp a rn A 
X bAAb A X A X A 


mpprnp a appa 
X bb 1 b AAbbA 


f\ AAA /~tm/~i /~1 A P 

bAAACTCCAC 


p m aaa m/""i /~i tv /~« 

GTAAATCCAG 


C AAAC TC AAG 


78 0 


PPPprn7\ 7\m7i n 
bbbb X AA X Abr 


pmp a A P A m A A 

b X bAAb A X AA 


A ppmmA m A A p 

Abb 1 X AX AAb 


/~i 71 rnp f\ m a m 

bATGCGCTAT 


/-^ *n TV jP**1 iP*1 TV T\ 

C C AAGG AG AG 


AAGCAAGGTT 


840 


P A P A A P A mprp 
bAbAAb A X b 1 


rprnpPP7\ A Aprn 

X XGbbAAAbl 


a a a a a p p mmm 

AAAAACGTTT 


AAAATGTTTT 


CAACAACCTA 


TCGAAATCAT 


900 


oprpT\ a A P P P m 

b G X AAAb bib T 


mppp a mm a pp 
X bGGATTACG 


7\ t\ m/~> 7\ -A mmmp 

AATGAATTTG 


ATTGCTGGCA 


TTATCAATTA 


TGAACTAGGA 


960 


TTCTAGTTTT 


GCAGGAAGTC 


TATTATTTTC 


CTTATTGTCT 


GTAAGTCTAC 


TGACCTTGTT 


1020 


GTTTATCCCA 


GTCATGGTTT 


CTAGTTCGGG 


CTCAGAGTTT 


CAAAGTGGAT 


GGCAAGAGCA 


1080 


TCAATTGATT 


GCTGAGAAGG 


TTAGTAAAAC 


ACTTGACAAG 


ACATTTGATA 


AGGATGTCAG 


1140 


AAAAATTCCG 


AC C AGTC AGT 


TTTATCAAAA 


ATTTGTAGAT 


GAGATGGGAA 


GGATTTACTC 


1200 


AGGAAATTTG 


ATCCTCCCAG 


GAGCTGATAA 


CTGTGAATGG 


AG 




1242 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1575 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



GTGATGGGGC 


CTCAGGGAAA 


TGGTTTTGAC 


TTGTCTGACC 


TTGATGAGCA 


GAATCAGGTT 


60 


CTCCTTGTTG 


GTGGTGGGAT 


TGGTGTTCCA 


CCCTTGCTTG 


AGGTGGCCAA 


GGAATTGCAT 


120 


GAACGTGGAG 


TGAAAGTAGT 


GACAGTCCTC 


GGTTTTGCTA 


ATAAGGATGC 


TGTTATTTTG 


180 


AAAACGGAAT 


TGGCTCAGTA 


TGGTCAGGTC 


TTTGTAAC G A 


CAGATGATGG 


TTCTTATGGC 


240 


ATCAAGGGAA 


ATGTTCCGTT 


GTTATCAATG 


ATTTAGATAG 


TCAGTTTGAT 


GCTGTTTACT 


300 


CGTGTGGGGC 


TCCAGGAATG 


ATGAAGTATA 


TCAATCAAAC 


CTTTGATGAT 


CACCCAAGAG 


360 


CCTATTTATC 


TCTGGAATCT 


CGTATGGCTT 


GTGGGATGGG 


AGCTTGCTAT 


GCCTGTGTTC 


420 


TAAAAGTACC 


AGAAAGCGAG 


ACGGTCAGCC 


AACGCGTCTG 


TGAAGATGGT 


CCTGTTTTCC 


480 


GCACAGGAAC 


AGTTGTATTA 


TAAGGAGAAA 


ATTATGACTA 


CAAATCGATT 


ACAAGTGTCT 


540 


CTACCTGGTT 


TGGATTTGAA 


AAATCCGATT 


ATTCCAGCAT 


CAGGCTGTTT 


TGGCTTTGGA 


600 


CAAGAGTATG 


CCAAGTACTA 


TGATTTAGAC 


CTTTTAGGTT 


C TAT TAT GAT 


CAAGGCGACA 


660 



146 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



AV — 1 1 OriTi^. 


v,nv^vji x x x vjvj 


PA ATPP A APT 


HP A A P 1 A prppp 
V_V_ttAvjAvjr 1 Vjvj 


v_ A vjj A vjt A v_ Vjj v_ V_ 


1 GCTGCj 1 A 1 Cj 


"7 O P 


V.- X ' — fWA X VjV_tttt. 


TTPPPTTPP A 

X X VJJ VJV_ X X VjV^/i 


A A ATPPT'PPT' 
^\x-Lrl X ^ v_ X uu X 


1 X AvjAvjVj 1 1 vjj 


1111 CjjvjC_ I CjA 


AAAGCTACC x 


*7 O P 


TGGCTGGAAA 


G AG AAT ATP P 


AAATPTTPPT 

nAri x v_ X lev. i 


A mmammrrr 1 a 

tt. X Xtt.1 IbLLn 


amrmarpmnp 

/lib 1 Avjv_ 1 vjjvj 


11 1 1 1 C_ AAAA 


Q /l Pi 
O 4i U 


CAAGAGTATG 


CAGCTGTTTP 


TPATGGGATT 

x v_.fi. x vjvjjvjtt. X X 


TPP A APPP A A 
X v- v. ArivJ Vj v-rlA 


pmaamamaaa 

v_ 1 AA1 Al AAA 


A PPmA 'T^P 1 P , A P 1 
AbL 1 A 1 v_ vjjAvj 


J u u 


C TC AAT ATTT 

V- X V_- -Oil X xi 


CTTGTCCPAA 

v_ J. x vj x v_ v_ v^nri 


TPTTPAPPAP 

X vjj X X VJrlv- V_tt.v — 


TPTA ATT 1 ATT* 
X vj X tttt. lLniu 


p 1 a p 1 't"t' ^ p ^ pp , a t> 
vjAv_ 1111 vjA 1 


1 Vjjvj 1 v_ AAvjA 1 


q n 
you 


CCAGATTTGG 


CTTATGATGT 

V- X X ii. X vjii X vj X 


GGTPAAAPPA 

OvJ X VJJ.MJ-Vtt.vjJ V_ tt. 


PPTPTPPA ar 

UV- X Vjj x VjVJtttt. VJJ 


ppmp ara a p"!" 1 

v_v_ 1 V_AvjjAAvjj 1 


vjv_v_Avj! 1 1A1 


lUzU 


GTCAAATTAA 

vj x v^nxxn. x x xxxi 


CCCCGAGTGT 


GAPPPATATP 

vjfiv_ v_ vjn x ii. x v_ 


PTT A PTPTPP 

vJT X X ttv_ X VJJ X V_ VJJ 


P A A A aPPTPP 
LAAAAbL 1 vjjv_ 


A P* A APA rnp P 1 P 1 
AvjAAvjA 1 vjjv_ vjj 


lUoU 


GGAGCAAGTG 


GCTTGACTAT 

VJ V — X X VJJXi. V— X xi. X 


PATPATAPTP 

VJJ tt X V_tt X ttv_ X V_ 


TPPTPrzn a tp 

X VJJVJJ 1 VJ VJTVJtt. 1 VJJ 


pppmmmp a p'p' 

L-VjjL. Ill VjjAv_v_ 


rnP'A A A A P'P* A P* 

1 v_AAAAv_v_ACjj 


114U 


AAAACCAATC 


TTGGCCAATG 

x x \jvj vvriii x vj 


GAAPAPPTPP 

VJtt-ttV_tt,VJVJ X VjJvjJ 


A ATPTP arrm 

tttt. X vjj x v_.tt.vj vjj 1 


ppappa P 1 mmm 


rn P'P 1 APmAPPP 
1 LLAb 1 Avjjv_v_ 


IzUU 


CTCAAAPTPA 


TPPGPPAAPT 


APPPP A A AP A 

tt.VjV_ V_ V tt_tt-ttv_,tt. 


tt.V_tt.VJJtt.V_V_ 1 VJV_ 


v_ 1 A 1 V_ A 1 1 VJVJ 


AA 1 vjjCjjCjjvjjvjjvjA 


lZ D U 


vj x vj vjn x x vj vj 


CTGAAGPTGr 

V — - X VJfiTivj\_ X VJJ V 


PPTAPA A ATP 

V— V- X ttvjJtt-tttt. 1 VJ 


mamrmpppmp 

1 tt X v_ 1 Vjjvjv_ 1 vj 


Vjvjvj v_ A 1 v_ 1 (jjv_ 


m a rn/~i/~i s~* a o mm 
1 A 1 CGCjACj 1 1 


■i o O P 
1 J5 Z U 


bbAACAG C T A 


ACTTTACCAA 


TCCTTATGCC 


TGCCCTGACA 


TCATCGAAAA 


TTTAC C AAAA 


1380 


GTCATGGATA 


AATACGGTAT 


TAGCAGTCTG 


GAAGAACTCC 


GTCAGGAAGT 


AAAAGAGTCT 


1440 


CTGAGGTAAA 


CTGCAATCAA 


TCTGTTCTTG 


ATTTTTTATT 


AGTTTGTAAT 


ATGAATTTAG 


1500 


GAGAATTTTG 


GTACAATAAA 


ATAAATAAGA 


ACAGAGGAAG 


AAGGTTAATG 


AAGAAAGTAA 


1560 


GATTTATTTT 


TTTAG 










1575 



(2) INFORMATION FOR SEQ ID NO : 3 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 776 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



CTAAGATATC 


AGAATAACAA 


CGAAATCGAA 


GCATTAAAAA 


CAAATATTAC 


TTCTAAGAAT 


60 


AGCGAGATTG 


ATAGTCAACA 


AAGCAATATT 


AAGGATATGA 


CCGTACCTAT 


AATGATCCAA 


120 


CTTCTCAGGC 


TTATAATATT 


TATGCTCAAT 


TAATTAGTGA 


GTTAGGTACT 


GCTCGTTCAA 


180 


ACAACAATAA 


AAGTATTACA 


GAGCTTGAGG 


CTAATCTTGG 


AGTGGCAACA 


GGTCAAGATA 


240 


AAGCTCATAG 


TATATTAGCG 


TCAAATGAAG 


GTACTCTGCA 


TTATCTGGTA 


CCTTTGAAAC 


300 


AAGGAATGTC 


TATTCAGCAG 


GGGCAAACGA 


TAGCAGAAGT 


TTCAGGGAAA 


GAAAAAGGTT 


360 


ACTATGTAGA 


GGCTTTTGTA 


CTTGCGAGTG 


ATATTTCTCG 


TGTTTCAAAA 


GGAGCAAAAG 


420 


TTGATGTTGC 


TATTACTGGT 


GTGAATAGTC 


AAAAATATGG 


AACACTAAAG 


GGACAAGTCA 


480 


GACAGATTGA 


TTCAGGAACA 


ATTTCCCAAG 


AAACGAAAGA 


GGGGAATATT 


AGCCTCTATA 


540 


AAGTCATGAT 


AGAATTAGAA 


ACCTTAACTC 


TAAAACATGG 


AAGCGAGACG 


GTCATACTCC 


600 


AAAAGGATAT 


GCCAGTTGAA 


GTGCGGATTG 


TCTATGATAA 


AGAAACCTAT 


CTTGATTGGA 


660 


TTTTAGAAAT 


GTTAAGTTTC 


AAGCAATAAT 


TGGTTTTAAA 


CCTTAGGTAA 


CCTATAAAAA 


720 


CAAATAAGGT 


AGAGAAAGGA 


TATTTTATCT 


AAGTTAGCTC 


ACATTACTGC 


CATTCC 


776 



(2) INFORMATION FOR SEQ ID NO: 34: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1487 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



CTGGCCTTTC 


TCCACCAAAA 


TTGTTCCTTG 


AGGGAAGGAA 


GTCAGAACAC 


TAGCCGTTGC 


60 


ATCTTCCTTT 


TGCTTTTCAA 


TCGTAATTCC 


AGATAATTTT 


TCCCATTCTT 


TTTGGTGACC 


120 


CCGGGAGGCA 


GGATTGAATG 


GCTTGAGGGA 


AATGACAAAC 


TTGTCCTAGC 


AAGAATGGTC 


180 


AAGGCACCTC 


CGTCTACAAT 


CAAAATCTGA 


TTTGGGCTTA 


AATTAACAAA 


GACCTGTTTT 


240 


ACTAGATTTT 


CTCCAGAAGC 


ATCGTCTCGT 


AAACCAGGCC 


CCAGCAAGAT 


AACTTCTGCC 


300 


TTCTCCAATT 


GCTCTTTTAA 


CAATTGCTGG 


TCTTGAAGAG 


AAAAGG C CAT 


AGGCTCAGGT 


360 


AAATGGCTGT 


GCAGAGCCGG 


GATATTTTCC 


CTGTCCGTTC 


CAACGGTCAC 


CAATCCTGCA 


420 


CCGCTTTTTA 


CAGCTGCTAA 


AGCAGCCATG 


ATGATGGCAC 


CTCCATAAGG 


ATAAGTACCA 


480 


CCAAGCAGCA 


GCAGACGACC 


ATAATCTCCT 


TTATGACTTG 


AACGAGAACG 


TTCAATAATA 


540 


ACTTTTTCTA 


GTAAGGTTTG 


ATTAATCACT 


TTCATCCTTT 


TTCCCTCTCA 


CTTTTATTAT 


600 


ACAACAAAAA 


GGAGACGCAG 


ACCTCCTTTT 


GTAATCTTAT 


ATCTAAAATT 


TAATATTCAT 


660 


TTCTGCCATT 


TTAGATATAG 


C T AT AG AAAA 


TACACTCTAT 


TAATCGAATG 


TTTCTCTTAT 


720 


TTTCTATCCA 


ATGTCCGAAG 


TGCTGCTTGA 


TAAGTTTGCT 


CCATCAGCAT 


GGTAATGGTC 


780 


ATAGGACCGA 


CACCTCCAGG 


GACTGGCGTG 


ATATGGCTAG 


CAAGTGGTGC 


AACTGCCTCA 


840 


TAATCAACAT 


CTCCACAGAG 


CTTCCCATTT 


TCATCTCGGT 


TCATCCCAAC 


GTCAATGACA 


900 


ACCGCACCTG 


GTTTGACAAA 


GTCAGCAGTC 


ACAAACTTGG 


CGCGGCCGAT 


TGCGACTACA 


960 


AGAATATCTG 


CTTTAGCAGC 


CACCTTGGCA 


AGATTATGAG 


TTCGTGAGTG 


GGCCAAGGTT 


1020 


ACTGTCGCAT 


TTTTAGCCAA 


AAGAAGC TGA 


GCCATAGGTT 


TTCCAACGAT 


ATTTGAACGA 


1080 


CCGATTACGA 


CCGCATTTTT 


ACCTTCCAAG 


TCAATCCCAT 


ATTCATGAAA 


CATTTCCATA 


1140 


ATTCCTGCAG 


GTGTCGAGGG 


AATCATGACT 


GGATGTCCAG 


ACCAAAGACG 


TCCCATGTTT 


1200 


AGGGGATGGA 


AACCATCCAC 


ATCCTTTTCT 


GGGTCAATGG 


CTAATAAAAC 


CGCCTCTTCA 


1260 


TCGATATGTT 


TTGGTAATGG 


CAACTGGACC 


AAAATC C CAT 


GCCAAGCTGG 


ATCCTGATTA 


1320 


TATTTAGCAA 


TCAGGTCTAA 


CAATTCCTCT 


TGAGTAATGG 


TCTCTGGAAC 


TCGCACTACT 


1380 


TCGGTACGGG 


AACCAGCCGC 


AAGAGCTGAC 


CTCTCCTTGT 


TGCGAACGTT 


AAACTTGGCT 


1440 


GGCTGGATTA 


TCCCCAACCA 


AAATCACTAC 


CAAACCAGGC 


ACTAGAG 




1487 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1634 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 5 : 



CGTGCCTTGG 


C CAAT gat c C 


AAAAATCTTG 


ATTTCAGACG 


AGTCGCTTCA 


AATTTCGGCC 


60 


LCTGGACCCT 


TAAGACCAAC 


C C AAGC AG AT 


TTTGGCCCTT 


GGTTGCAAGA 


TTTGAACCAA 


12 0 


•A 7v "A mm a /"i m 

AAAT T AGG C T 


TGACTGTTGT 


C C TG AT T AC G 


CATGAAATGC 


AGATTGTCAA 


AGACATTGCC 


180 


AACCGTGTTG 


CAGTTATGCA 


GGATGGGCAT 


TTGATTGAAG 


AGAGTAGTGT 


GCTTGAAATC 


240 


TTCTCAGACC 


C TAAAC AAC C 


TTTGACTCAA 


GACTTTATCT 


CAACAGCTAC 


AGGTATTGAC 


300 


GAAGCCATGG 


TCAAAATCGA 


GAAGCAAGAA 


ATCGTGGAAC 


ACTTGTCTGA 


AAACAGTCTC 


360 


TTGGTGCAAC 


TCAAGTACGC 


TGGATCTTCA 


ACAGACGAGC 


CACTTTTGAA 


TGAATTGTAC 


420 


AAGCATTATC 


AAGTAATGGC 


TAATATTCTC 


TATGGGAATA 


TCGAAATCCT 


CGATGGTACT 


480 


CCTGTTGGAG 


AATTGGTGGT 


GGTCTTGTCA 


GGTGAAAAAG 


CAGCGCTGGC 


AGGTGCTCAA 


540 


GAAGCCATTC 


GTCAAGCAGG 


CGTACAGTTA 


AAAGTATTGA 


AGGGAGGACA 


GTAAGATGGA 


600 


ATCATTGATT 


CAAACCTATT 


T AC C AAATGT 


CTATAAGATG 


GGTTGGTCTG 


GTCAGGCAGG 


660 


CTGGGGAACA 


GC TATCTACC 


TAACCCTCTA 


TATGACAGTT 


CTTTCCTTCA 


TTATCGGAGG 


720 


CTTCTTGGGG 


CTAGTGGCAG 


GTCTCTTTCT 


CGTCTTGACA 


GCGCCAGGTG 


GTGTCTTGGA 


780 


GAATAAAGTC 


GTATTCTGGA 


TTTTAGACAA 


AATTACCTCA 


ATTTTTCGTG 


CGGTTCCCTT 


840 


TATCATCCTC 


TTGGCAATCT 


TGTCACCACT 


TTCTCACTTG 


ATTGAAAAAA 


CAAGTATCGG 


900 


GCCAAATGCA 


AGCCCTTGTC 


CCACTTTCTT 


TTGCAGTCTT 


TGCCTTCTTT 


GCCCGTCAGG 


960 


TGCAGGTTGT 


CTTGGCTGAA 


ATGGATGGCG 


GTGTCATTGA 


GGCGGGCTCA 


AAGCGAGCGG 


1020 


AGCGACTTTC 


TGGGACATCG 


TGGGTGTTTA 


CCTATCAGAA 


GGTCTTCCAG 


ATTTGATCCG 


1080 


TGTGACGACT 


GTGACCTTGA 


TTTCCCTTGT 


TGGGGAAACA 


GCTATGGCCG 


GTGCGGTTGG 


1140 


AGCTGGTGGT 


ATCGGTAACG 


TAGCCATCGC 


TTATGGATTT 


AACCGCTACA 


ATCACGATGT 


1200 


GACCATCTTG 


GCAACCATCG 


TTATCATTTT 


GATTATCTTT 


GCAATCCAAT 


TCTTAGGAGA 


1260 


TTTCTTGACT 


AAGAAATTGA 


GC CAT AAAT A 


AAAAAGAGCC 


GTGTGGCTCT 


TTTTAACTGA 


1320 


TCAGATTTTC 


TGGGCAAATT 


TTTTACTCAA 


GGCTTGTCCA 


ATCAAGGCAC 


CCACTAGGGC 


1380 


TCCGATGACA 


ATACTTGCGA 


TAAATAGAAG 


GACAGTTCCA 


GGGTTTGGAG 


CGACCATGAT 


1440 


GCGGTCGATA 


TATTCTTGGG 


ATTTTCCTCT 


TGCCAGAAGA 


GTAGCCATAT 


AGGCTTTGGG 


1500 


CGCAATCCAC 


ATAAGCAAGA 


TTGGTCCTGT 


TGTACTAAAG 


GCGAAAATAA 


TGAAAGAAAG 


1560 


GAAGTTCTTT 


GTTTTGTCCT 


TGTATTTTCC 


TAAATGAGCT 


ACTCCATCTG 


CTAGGAGGCC 


1620 


ACAGATAATT 


CGAT 










1634 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1087 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 6 : 
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an A A TC A A 


X vxtt. X ^J7 X V— X 


brv^ 1 1 bib* 1 


rprnr-trprp A r"" 1 A A A 

1 itl 1 AG AAA 


AAATA1 1 ICC 


TGAGCGCTTA 


o U 


C/iOri X X nu X X 


TGOGPT r rnr ,r p 

X VJVJVJ^ X X X 


A r T r PT r P ATP A 


1 XbjAbjb,bjbj 1A 


b-Abib, 1 b,b,b, 1 1 


G 1 GG 1 AGG AA 


1 O Pi 


RprTATfrrT 


TTC1T r* T 1 T T PJ P4 

X -L VJJ J- v— X X X UvJ 


AATArnnpTT 


^ILi 1 1 bib» 1 1 


1 brbrbr 1L1 1 bib? 


/-» a rfP A rrtP A A rp 

bjA 1 bjA 1 tAA 1 


i on 
loU 


nrrAAfinrrA 


X X X \~ X /T. X X -T\ X 


rAr,Tr,A arnr 


rp a CC* A APPA A 


AAA C~*C*C*C* A A rp 

AAAbxbrb,bxAA 1 


1 GAGA 1 G 1 1 A 


o a r\ 
Z 4 U 


nc4C4r , TAr i c4c i c4 


P T T C"T C4r* A O A 


nr4 r rr , r;'T ,r pnriA 


bib. X 1L1L 1 b, A 


rnrp a pprpmppp 
1 1 Ab. Li 1 brbrb. 


PPTPPPTip A A 

GG 1 GGG 1 GAA 


jUU 


VJ 1 iOl X \JK3\^ V_ 


t t* t pa pa t tpapa A 

XXX Ovjt X X UU/i 


PAPPTATPTT 
v^.r\Vii^ X r\. 1 1 


fTipm A /T^r^rp A rp 
1 V- 1 Abjb. b- x A 1 


A pfTippmpp a rp 
AG 1 GG 1 GGA i 


rrrrn hp m m rpi 

111 1GG I GG 1 


"3 £ n 


RPPrPTTTAT 

VJV-^V^V^V^ X X Xr\± 


v_ x X V_ X X X \SJ 




A A A A rpp A A A f 1 


A A A P 1 A A prnn A 

AAAGAAG 1 G A 


A/~"A A A A O A OP 

AG AAAAG AG G 


/ion 
4zu 


C4A AnnAAnrA 


AOTfnTTT A A 

X V— \ J XXX -tA^A 


TTmAriAAAT 


PA A APPprpmp 
bj/^AAbrbrb- lib 


A rnrprnrnrn A /~> /~* rn 


mAPPfFAfTPP A 

1 AGG 1 A 1 GGA 


A Q Pi 

4oU 


Anrnnr ahtt 

xAVjV X X 


V-J X XXX \J X xA 


PPAATAPAPP 




PprnArT"TipP7i A 
LblAi 1LLAA 


Gill GA 1 GG 1 


c; A A 


rzrz A A Am A CZCZ A 


1 X X br 


pppAPT'TaTr 1 

b-b,b.Abrl 1/llL 


1 Abj 111 1 G 1 1 


C 1 1 AG PATC A 


TGCAGTTGAT 


60 0 


criciri a rprvczTrz 


pap'T'papapapa'pp* a 


prnrpmmmnrpmm 


P" , rnrp/-i 7\ mmrnr^rn 
Li i biA 1 1 1 L 1 


ATCTTTAAAG 


AGAAACTGCT 


a a r\ 
DDI) 


1L 1 bibi 1L 1 


vjvj 1 r\b- b. X 


1 1 brbib. X 1 blbrbj 


GGAAA1GGIG 


ATTGCCTTGT 


CTTCATCCTT 


72 0 


bj 1 bibrbi 1 X /\ 


Vj 1 AVjl^.f-ibjbj.MA 


bj 1 o 1 1L1 bjbjb. 


mpp TV mr» imppn 

1 bbATTTGCC 


TAT AG TG TAG 


TCTTGACGAC 


7 80 


GGTCTTTCAA 


CTTGTCTCTG 


AACGAATTCC 


AGCTAAACTC 


CTCAATCAAG 


CAACTTCATT 


840 


TGCTGTATTA 


GGCTGTAGTT 


TCGGAGCCTT 


TACGACCCCA 


TTCGTTCTAG 


GTGCAATTGG 


900 


CTTACTAACT 


CACAATGGGA 


TGTTGGTCTT 


TAGTATC TTA 


GGAGGTTGGT 


TGATTGTAAT 


960 


CTCTATCTTT 


GTCATGTACC 


TACTTCAGAA 


GAGAGCTCTA 


GGATTGATTC 


CTAAGTTTTT 


1020 


CTTTTGATAC 


TCAATGAAAA 


TCAAAGAGCA 


AACTATAGTT 


GATTGAGTTT 


GGAATAGTAT 


1080 


GCTGTAG 












1087 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1191 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



GGATTCCAAC 


GATTATGAAC 


TTGACTGGTC 


CACTGATTCA 


TCCAATGGCT 


TTAGAAACAC 


60 


AGCTTTCTTG 


GAATTAGTCG 


TCCAGACTCC 


TAGAAAGTAC 


AGCTCAGGTT 


TTGAAAATAT 


120 


GGTCGCAAAC 


GTGCCATCGT 


GGTTGCTGGA 


CC AG AAGGGT 


TGGATGAAGC 


TGGCTTGAAC 


180 


GGAACAACCN 


AGATTGCACT 


TNTTGAAAAT 


GGCGAAATCA 


GCTTGTCAAG 


CTTTACTCCA 


240 


GAGGATTTGG 


GAATGGAAGG 


CTATGCTATG 


GAAGATATTC 


GTGGTGGGAA 


TGCTCAGGAA 


300 


AATGCAGAAA 


TTTTGCTTAG 


CGTTCTGAAA 


AACGAAGCAA 


GTCCATTCTT 


GGAAACGACA 


360 


GTCTTGAATG 


CTGGTCTTGG 


TTTCTATGCT 


AATGGTAAGA 


TTGATAGCAT 


CAAGGAAGGA 


420 


GTTGCCTTGG 


CCCGTCAAGT 


GATTGCTAGA 


GGCAAGGCCC 


TTGAAAAACT 


CAGACTGTTA 


480 


CAGGAGTACC 


AAAAATGAGT 


CAGGAATTTT 


TAGCACGAAT 


CTTAGAGCAG 


AAGGCGCGTG 


540 


AGGTGGAGCA 


GATGAAGCTG 


GAGCAAATCC 


AGCCTCTGCG 


CCAGACCTAT 


CGCTTGGCAG 


600 


AATTTTTGAA 


GAATCATCAG 


GACCGCTTGC 


AGGTAATCGC 


TGAGTCAAGA 


AAGCTAGCCC 


660 
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1 ±\\J 111 kjkjbiA 


P 1 A Ti A mn t\ tv mr 1 
brA I A 1 LAA I L. 


1 bbrA 1 br ± brbjA 


TATTGTGCAA 


CAGGCCCAGA 


CTTATGAAGA 


72 0 


A A A CCC APPA 


P 1 Tip 1 A rpf" 1 a rnrpm 
br 1 br A 1 biA 111 


bbjbjlbii IbjAL. 


A P"< A rnp A ppmm 


TTCTTTAAAG 


GGCATTTGGA 


""TOP 

/o(J 


1 J. /t. -L ^ ± \J\JJ 


PI A A A rnmmpp a 


bi I b- Abrbr L AbA 


p« a Tirpppp A P«P< 
bjA 1 Ib,L.biAb.bj 


C I LAALAAAG 


ACTTTATCAT 


O A P\ 


A f^l A TCZ A A A A 


pa a a tp A Trr 




1 brbAbibj 1 bjb-br 


A P 1 A t~* rnrn a rpprn 

ALAb I lAILi 


TGCTTATTGT 


Q Pi Pi 

y u u 


GGCAGCCTTG 


TCCGAAGAAC 


GCCTCAAGGA 


ACTGTATGAC 


TACGCGACAG 


AGCTTGGTCT 


960 


GGAAGTCTTA 


GTGGAGACTC 


ACAATCTAGC 


TG AAC TAG AG 


GTAGCCCACA 


GACTTGGTGG 


1020 


CTGAGATTAT 


CGGGGTCAAC 


AACCGCAACT 


TGACTACCTT 


TGAAGTCGAC 


TTGCAGACCA 


1080 


GTGTAGATTT 


AGCCCCTTAC 


TTTGAGGAAG 


GTCGCTATTA 


CATTTCTGAA 


TCTGCCATTT 


1140 


TCACAGGGCA 


GGATGCGGAA 


CGACTAGCCC 


CATACTTTAA 


CGGAATTCGA 


T 


1191 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 858 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 



ATCGAATTTG 


CCAACCAAGA 


AAAATATCCC 


TTGGATGGTT 


CTTGGCAATG 


CAAGCAATAT 


60 


CATCGTTCGT 


GATGGTGGGA 


TTCGTGGATT 


TGTCATCTTG 


TGTGACAAGC 


TCAATAACGT 


120 


TTCTGTTGAT 


GGCTATACCA 


TTGAAGCAGA 


AGCTGGGGCT 


AACTTGATTG 


AAACAACTCG 


180 


CATTGCCCTC 


CGTCATAGTT 


TAACTGGCTT 


TG AGTTTGC T 


TGTGGTATTC 


CAGGAAGCGT 


240 


TGGCGGTGCT 


GTCTTTATGA 


ATGCGGGTGC 


CTATGGTGGC 


GAGATTGCTC 


ACATCTTGCA 


300 


GTCTTGTAAG 


GTCTTGACCA 


AGGATGGAGA 


AATCGAAACC 


CTGTCTGCTA 


AAGACTTGGC 


360 


TTTTGGTTAC 


CGCCATTCAG 


CTATTCAGGA 


GTCTGGTGCA 


GTTGTCTTGT 


CAGTTAAATT 


420 


TGCCCTAGCT 


CCAGGAACCC 


ATCAGGTTAT 


CAAGCAGGAA 


ATGGACCGCT 


TGACGCACCT 


480 


ACGTGAACTC 


AAGCAACCTT 


TGGAATACCC 


ATCTTGTGGC 


TCGGTCTTTA 


AGCGTCCAGT 


540 


CGGGCATTTT 


GCAGGTCAGT 


TCGAATTTCA 


GAAGCTGGCT 


TGAAAGGCTA 


TCGTATCGGT 


600 


GGCGTAGAAG 


TGTCAGAAAA 


GCATGCAGGA 


TTTATGATCA 


ATGTCGCAGA 


TGGAACGGCC 


660 


AAAG AC T AC G 


AGGACTTGAT 


CCAATCGGTT 


ATCGAAAAAG 


TCAAGGAACA 


CTCAGGTATT 


720 


ACGCTTGAAA 


GAGAAGTCCG 


GATCTTGGGT 


GAAAGCC TAT 


CGGTAGCGAA 


GATGTATGCA 


780 


GGTGGTTTTA 


CTCCCTGCAA 


GAGGTAGTGG 


GGACCTGACA 


GAGCCCCGAT 


CGGTTAATCT 


840 


ATGAAAAAGA 


AGGAATTT 










858 



(2) INFORMATION FOR SEQ ID NO : 3 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 



CTGAAAAAAC 


AGGTTTTGAC 


TATGNAG AT T 


G AC AGAC G AC 


CGTTCGGAGG 


TGCAGATATT 


60 


GATGCAGCAG 


GACCTCCCTT 


ACCTGATGAA 


ACCCTTAAGG 


CAAGTAGGGA 


AGCAGATGCT 


120 


ATCCTACTAG 


TAGCTATC GG 


TAGTCCTCAG 


TATGATGGAG 


TAGCGGTTCG 


CCCTGAACAA 


180 


GGCCTGATGG 


/"*t m /-"i m /^i y*t m t\ t\ 

CTCTCCGTAA 


GAACTCAATC 


TTTACGCTAA 


TATTCGTCCT 


GTAAAAATCT 


240 


TTGACAGTCT 


CAAGTATTTG 


TCACCACTCA 


AACCGGAACG 


AATTTCTGGT 


GTAGACTTCG 


300 


TCGTGGTGCG 


TGAATTGACT 


AGGCGAGATT 


TACTTTGGAG 


ATCATATCCT 


TGAAGAGCGC 


360 


AAAGCGCGTG 


ATATCAACGA 


CTATAGCTAT 


GAGGAAGTGG 


AGCGGATTAT 


TCGCAAAGCC 


420 


TTTGCCATCG 


AATTGCAAGA 


AATCGCAGAA 


AAATCGTTAC 


TAGTATCGAT 


AAGCAAAATG 


480 


TTCTAGCGAC 


CTCAAAACTC 


TGGCGGAAAG 


TAGCTGAGGA 


AGTCGCACAG 


GATTTCTCAG 


540 


ATGTAACCTT 


GGAACACCAG 


CTGGTAGACT 


CAGCTGCTAT 


GCTTATGATT 


ACCAATCCTG 


600 


CTAAGTTTGA 


TGTTATTGTA 


ACGGAGAATC 


TTTTTGGAGA 


TATTTTATCT 


GATGAATCAA 


660 


GCGTCTTATC 


TGGTACACTT 


GGGGTTATGC 


CATCAGCCAG 


TCATTCTGAA 


AATGGACCAA 


720 


GTCTCTATGA 


ACCTATTCAC 


GGTTCAGCAC 


CTGATATTGC 


AGGTCAAGGA 


ATTGCCAATC 


780 


CTATTTCCAT 


GATTTTATCA 


GTTGTCATGA 


TGTTGAGAGA 


TAGTTTCGGA 


CGTTATGAGG 


840 


ATACAGAGCG 


TATCAAACGT 


GCTGTTGAGA 


CAAGTCTGGC 


GGCAGGAATT 


TTAACGAGAG 


900 


ATATAGGAGG 


TCAGGCTTCA 


ACAAAGGAAA 


TGATGGAAGC 


TATTATTGCA 


AGGTTATGAA 


960 


GTTAGACGAA 


AAAATTCGAT 










980 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 874 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



TCGATCTAGA 


GAATTGCTCC 


AGAGCTTCCT 


GACCGTCCGC 


TGCCTCAATA 


GTTTCATAGC 


60 


CACAATCCGT 


CAAATAATCA 


CTGACCCCCT 


CACGGATCAT 


CTCTTCATCT 


TCTACAATTA 


120 


AAATTTTCAT 


ACTTTAACTG 


CTCTCTATTT 


TTTATTTTTC 


TTAGAATAAA 


TACCTACTCT 


180 


ATTTTCTATT 


ATAGTCTCTT 


GCTGGCCTTT 


TGTATGTAAG 


CAACTGACCA 


C T AG AT AAAA 


240 


CGTTGTGAAA 


TTCCTTTCTC 


ATAAATTCCA 


TAACTTTAGT 


ATATTATATT 


TAAGCACTAA 


300 


AGTACAAAGA 


AAGCAACTGA 


AAGCAATGAT 


TTTCACCACT 


GCTTTCAGAT 


TTATTTTGAA 


360 


TTGTTAAATA 


GCTATTCCTA 


TCC AC TATTC 


TTGAATAGAA 


ACACAAGATG 


CAATCTTTAT 


420 


TCCAGACTCA 


TTTTTTAAAA 


AATCAAATTT 


ATTCACCATC 


CAGCAAGAGC 


TCTTTTGGTT 


480 
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GTTTTCTAAG GAGATTGCTT GAAGCAAGCG CCATAACGAG AACCACTAGA ACCAAGGCAA 540 

GGACAAAAAT GATGATAAAG TCTGATGTCT GAATGGAAAT GTCTAGGCTC GACAAGGTCT 600 

TGCTAAAGCC ATCTACTTCT GCACCGCCAC CAAGGTTAGA GGCTTGAGCC GCCTTACTAG 6 60 

CCTGTTTGGC AACACCTGAA GTCACATTGG CAAGGACAGT GTTTCCAATT CGCACGGGCA 72 0 

GTGTAATTAG CTAGGAAGTA AGCANAAACT AGAGCAGGGA TAGCAATCAA GATAGATTCG 7 80 

GTGATGAATT GACCCAAGAT ACTTGCCTGC TTGAGACCAA TAGAGAGGAG GATTCCCACT 840 

TCCTTGCCGA CGGGCATTGA TCCAAAGACT GAGC 874 

(2) INFORMATION FOR SEQ ID NO: 41; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 41: 



CTTGTAACGG 


TCATAAAGTT 


TCTGCAAACT 


ACCATCCTTG 


CTCCATTTAG 


TAACCAAGTT 


60 


ATCAAGATAG 


TCGTTGAGCT 


CTGTATTTGA 


TTTCTTGGTA 


ACAATACCGT 


AGTCAGATGG 


120 


CTTGAAACTA 


TCATCTAGTA 


GTTCTGTGCG 


TTTAACTAGT 


GTAGCCAGAT 


AG AAT AG AG C 


180 


GGTCAACGGA 


AAAGGCATCG 


ATACGATGAG 


CGTGAAGGGA 


AGTAATCAAT 


TCTGGGTAGG 


240 


AACCAAGTTC 


GACGAATTTA 


AACTTCAGAC 


CTTTCTTTTT 


ACCCAGTTCA 


GTAATCAGGC 


300 


GTTGGGTGAT 


AGAACCTTGG 


GCGACTCCGA 


TGGTTTTGCC 


GTTTAGGTCC 


TCAATCTTTT 


360 


TGATTTTGGC 


AGATTTATTG 


ACCAAAAATC 


CAGAAGCGTC 


TGTGTAGTAG 


GGACTGGTAA 


420 


AGTTGTAGAG 


TTTTTTGCGT 


TCGTCCGTGA 


TGGTAAAGGT 


CGCGATATCC 


ATATCGACCT 


480 


GTTCATTGTC 


TAGAAGGGGG 


CCGCGGGTTT 


GTGCTGTAAC 


CGGCACATAG 


TGAATCTTGA 


540 


CCTTGAGTTC 


ATCAGCTACC 


ATTTTGGC C A 


AGTCGGTTTC 


GAT AC C AG AA 


TAAGTACCGG 


600 


TCTTGGGATC 


TTTGTTAACC 


AAAATTGGGA 


ACGTCTTGTT 


TGACACCCGA 


CAACCAGTTC 


660 


GCCTCTTTTT 


TGAATGTCTG 


CGATACTAGT 


ATTAGCCTGG 


ACTGGTTTGG 


CAGCAACAAG 


720 


GCCGAAAAGG 


CTAATCAATA 


ATGCTGATAA 


AAAGAATTCG 


AT 




762 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1942 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
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CTCGAATTTT 


TGGTGCTCCA 


GAAACGGTTC 


CAGCAGGAAG 


CGTTGCTTTC 


AAGGCATCCA 


60 


o o a o mo 7\ o 

1GGCAGTGAG 


I rGTGCAAGC 


AAACGTCCCT 


TGACCACACT 


GGTCAAATGC 


ATGACGTAGC 


120 


OO A 'A O 7\ ppmr 1 


O A Oomoo a m "A 

CALL ILCAIA 


m a /"imm a o m 7\ 7v 

1 AC TTAGTAA 


/T mm/"— i s~*i t\ y^t tv m 

CTTGGACACT 


GGCCGTTTCA 


GAGATGCGGC 


180 


LAAlArLbl 1 


A OOOOOO A AO 

AbbLLLLAAb 


mom a o/~« a a r\ a 

TCTACCAACA 


TTCGATGTTC 


TGCTGTTTCC 


TTCTCATCAG 


240 


AbAbbAbCjT C 


a o moo O O A AO 

AG 1 LbLbAAb 


GCCTTGTCTT 


CTCCATCCGT 


AGCCCCTCTT 


GGTCGCGTCC 


3 00 


C I GC AA I Lbb 


a mmo o mmo mo 
All GG 1 1 G 1 C 


a a m/"i oo a m 

AG G ATG C CAT 


m m m m /t tv /—i tv / — i tv 

TTTTGACAGA 


AACCAAACTT 


TCTGGACTAG 


3 60 


C 1 CCGA1GAI 


mmo a m a a m/-*f* 

1 1GA1AA1CC 


<"■* /""I TV TV TV TV m/^i t\ m 

CCAAAATCAT 


TV 1 TV TV TV m T\ Ti "71 

ACAAATAAAG 


GTAATTAGAT 


GGATTAGTCA 


42 0 


o/"~* /~\ /~i s~i a o a mm 
LbCbbAbATT 


momomAOA ao 

TGTGTAGAAG 


m/*t aaa m/—i s~i a m 

TCAAATGGAT 


Tv /"i m m tv tv 

TTCCAGTTAA 


CTTCTGCGTG 


AAGAAAACGC 


480 


mooomo a /"I mm 

TGGCTGAGTT 


AOAOAmOOOA 

ACACATCGGA 


Tv i Tv m tv iTmmn/^ 

ACATATCTCC 


GTTACGAATC 


AAGTCACGAG 


CTGTTTCTAC 


540 


CATTCCCTCA 


7V tv /~i mm tv rn<nm/*i 

AACTTATGTG 


GAGCGATATG 


CGGTTTGAAG 


TCAAGTGGTG 


ATAAATCCAA 


600 


✓"-I m /-^i mm/^i tv tv tv m 

GTCTTCAAAT 


TCATTTGGAG 


CAGGAATGCG 


TAATTCCTCA 


AGCACTTGGT 


TCAAGGATTT 


660 


TTCCAAGGCC 


TCTTGACTGC 


GCTCACTATA 


AAGTGCATCC 


TCTATGACAT 


GTTATCTTCT 


720 


CCTTCTTGTT 


GGTCAAAGAC 


CATATAGCTC 


TCATAGACAA 


AGAAATGCAT 


GTCGGGCGTC 


780 


CCAATTGTAT 


CCTCAGGGAT 


TTGACCAATT 


TCTTCATAAA 


GCGAAATCAT 


ATCGTAACCA 


840 


ACAAAACCAA 


TGGCTCCCCC 


ACCAAAAGGG 


AGGTCTGAAT 


GGTGCTGGCT 


CTTATGAATC 


900 


ACTTCATAAA 


GGAAATCCAA 


GGGATCCCGA 


TCAATCGCTT 


GACCATTTTG 


ATAGAGAACT 


960 


CCATTTTCAA 


ACTTAATCTC 


AAAAAC TGG A 


TTATAGGCTA 


GGATAGAAAA 


ACGAGC TGTT 


1020 


TCCTTGTCTC 


TCGGAATACT 


CTCTAAAATA 


ACCTTATGTT 


GCCCCTTTAA 


GCGCATATAA 


1080 


GCCAAGATTG 


GTGATAAGAC 


ATCTCCATGA 


ATGATTCGTT 


CCATTGTCAT 


TTCCCTTTCA 


1140 


GTTCTAATTC 


GAGTTCGTGG 


CGACTGTATG 


AAAAATCCCC 


ACGCAAAATA 


ACTTGCGTGA 


1200 


GGACGAAATT 


CGCGGTGCCA 


CCTCAATTAT 


AGGATTTCTC 


CTATCTCTCA 


TTCCTGTCTC 


1260 


AGATATCTCC 


T GT AAC AGG C 


TGTGCGATAA 


AGGGCACTCC 


CTTGAGAATG 


ATGTTTTCTT 


1320 


CTCTCGTTTC 


AGATGAACCC 


AACTTTACAG 


CTTTCTCTGC 


TTGTTTTCAG 


CAACCACAAG 


1380 


CTCTCTGTGA 


GAGAAAAGAC 


TGTAATTTTT 


CCATCTATTA 


TTTTTTAGCT 


TCTAGTAATC 


1440 


TGCAATCGCA 


GCTAGGTCCT 


TGCCTCCACG 


ACCAGAGACA 


TTGATGAAGA 


GATGTTCATC 


1500 


TCGGTACACC 


TTTATACTCT 


TCGAAAATCT 


CTTCAAACCG 


CGTCAACGTC 


GCCTTGCCGT 


1560 


AGGTATGGTT 


ACTGACTTCG 


TCAGTTCTAT 


CTGCAACCTC 


AAAACAGTGT 


TTTGAGCTGA 


1620 


CTTCGTCAGT 


CTTATCGACA 


ACCTCAAAAC 


AGTGTTTTGA 


GCAGCCTGCA 


GCTAGTTTCC 


1680 


TAGTTTGCTC 


TTTGATTTTC 


ATTGAGTATT 


ATTTCATTTT 


CTCCTGCAAT 


TGAATTCTTG 


1740 


CTCAGCTTTT 


TGTCTTCTAT 


TTCTTTAAAA 


TCAAAGTAGC 


TCTTTTGTTA 


ATAACTCGAT 


1800 


CAACAAACAT 


CGTGGTACAA 


GTATCTACTT 


TGAAATTTAT 


CAACCACTTA 


ACAACTGATA 


1860 


CTGTATTTCT 


AGGAAAACGA 


TGACATTCTT 


CCTAATAAAA 


CTTCTCATAT 


ATAGCATAAA 


1920 


TTTCTACTCT 


TTTTAATTCG 


AT 








1942 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1048 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



CTGTTAAGAT 


TGTTTCCGTG 

X V_J XXX — ^VJ X VJ 


PATPPAPATA 

V-ri X >v\.ft^A X zt. 


PPATTT APP*T 
x X 1 ALL X 


r T , P ir PP ,r nP ,r n A rpp 1 


c^CT^r* a a mmp 1 a 

LjLtLLAAX 1 LA 


D U 


CCCATCAAAA 


CGCCATAGGT 


PTPATPTPTP 

L X LZTu X L X Vj X L 


A APAT l Ar r PZ\( ri 
1 H.L 1 Abi 


A P 1 A m A C*f~*r* A rn 
ALA 1 AL LLtA X 


7\ mmom a r*r* a a 
Al IblALLAA 


1 O Pi 


AGACTGGTAT 


GACGGAAATA 


APTPPATPPP 

ZT.VJJ X V-VJA X L3L \J 


TPT A A APTP A 
X Lr X zn-rizAL 1 L/i 


AP'AAAAAP'AP 1 
ALAAAAAbAb 


ALbLAAbl 1 br 


loU 


ATTAGAAAAA 


CCGTCATAGC 


AATAPPTPPP 
Jul x n.vjjv — xoll 


AP APP APPTT 
zALz-iLTL T zn.L3L J. X 


p* A APPfiPS AT" 

LtAALL alaa X 


P 1 A PTIPPP A A P» 
LALt X bLLAAL 


O /I Pi 
Z 4t U 


ATGGCAAACT 


GGGCACTPPP 


AGP AT A A AP A 

rloV-A X z^zTJ-i.L.tt, 


A AP AP A PTP A 
z^i-ibiALrAL 1 LA 


rpp A A P* P" P" P 1 A m 
1 LAAbL L LA 1 


L 1 LAALAbb x 


"2 P» Pi 


GTCACATAGG 


GPGCAPPGAT 


APTPPP AP AP 
r\\J Lz-ILz^Lj 


PPPAPPPPP A 
LLLAbbLLbA 


rn 71 pmp A i"** A m A 
1 Al 1 bALA X A 


bLLAAbAbLL 


"3 £T r» 


GTTGGPATGG 


PTPPPTPPPP 


PPPPTPPTA A 
LLLL ILL Xz-iz-i. 


A A mP 1 P*m mm mm 

AA xLLxXxxi 


til IlAJLl 1 1 


mo mo/""* mo a mA 

TlTllTlATA 




X X X X X rvfl X 


A ATAPTP A AT 


PA A A ATP AAA 
\zr\r\r\±\ X L/\AA 


P^ A P'P* A A A prp 7\ 
Lj Abr L AAAL 1 A 


r~*r~* a a a mm 7\ o o 
LrLrAAA I i Abb 


C G C AG GNTG C 


48 0 


x Lz^AzT-ttxaLz-VL L 


o rprnrprpo a PlPim 

Ui 1 1 1 VjZ-WjLT 1 


TPP ZIP AT 1 ap* a 
X LrLALrA 1 ALiA 


AAl 1 bALbAA 


bTLAbL TC AA 


AACACCGTTT 


540 


TP APPTTPP A 


PATAPAAPTP 


appA ap~ , t , p' ap 1 

ALbrAAbi 1 LALj 


1 AAlA 1 A TAT 


AlGGCAAGGC 


GACGTTGACG 


600 


TGGTTTGAAG 


AGATTTTCGA 


AGAGTATTAG 


AAAATGCCGA 


TAAGGGTCTG 


CATACCAAGG 


660 


CTGGTGAGGA 


TGATGGCAAT 


CCAGCAGACG 


GCTCCGAGAA 


CAATGGATTT 


TCCACTGGAT 


720 


TTGACCATAG 


CGACCAGATT 


AGTTTTGAGA 


CC'GATGGCAC 


TCATGGCCAT 


GATAATGAGG 


780 


AATTTAGAGA 


GTTGTTTGAG 


AGGGGTAAAG 


AAAC T AC TAG 


ACACACCGAG 


AGAGGTCAGA 


840 


AGGGTGGTTA 


GGAGCGATGC 


AAGGATGAAG 


TAAAGGATAA 


AAAGTGGGAA 


GACTTTTTTC 


900 


AGTTGTAAGC 


CTTGCTTATT 


TTTTTGCTCG 


CGACTTTGCC 


AGTAGGAGAG 


AAAGAGAGTG 


960 


ATGGGGATGA 


TAGCTAGGGT 


GCGCGTGAGT 


TTGACAATGG 


TTGCGGATTC 


GAGGGTATTG 


1020 


GTCTGGTAGA 


GACTGTCCCA 


AGCGCTAG 








1048 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1571 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 44 : 



AGAGCTGGTA 


ATATTCCCAA 


AGAAACGGCT 


CAAATCGAAT 


TAGAAAGCCT 


TCTGCAAAAA 


60 


GGAATCCCAG 


TCGCTCTGGT 


ATCACGATGC 


TTTAACGGTA 


TTGCCGAGCC 


TGTTTATGCC 


120 


TACCAGGGTG 


GGGGCGTACA 


GTTGCAAAAA 


GCAGGCGTTT 


TCTTTGTTAA 


AGAACTCAAC 


180 


GCCCAAAAAG 


CCCGCTTGAA 


ACTCCTCATC 


GCCCTCAATG 


CCGGACTAAC 


AGGACAGGCT 


240 


TTGAAAGACT 


ATATGGAAGG 


CTAATACTCT 


TCGAAAATCT 


CTGCAAACCA 


CGTCAGCGTC 


300 


GCCTTACCGT 


ATGTAGAGCA 


CAAAATCAGG 


AAATCTTCTC 


GATTCCCTGA 


TTTTTTCTAT 


360 


TTACGTTTTC 


GTGTTGAGCT 


ACGTTCTGTC 


AAACCATGAG 


GTAAGAGAAC 


TTCACGTTCT 


420 


TCCAACTCTT 


CCTTATGCAT 


AATCTTGGTC 


AACATACGCA 


TACTAATGGC 


ACCAAGGTCA 


480 


TAAAGAGGTT 


GGGCAATCGT 


TGTCAAGTTT 


GGACGGGTAA 


AGCGTGAGAT 


TTGTGAATCA 


540 


TCACTAGTAA 


TAATTCGATA 


ATCTTCTGGC 


ACAGAAACAC 


CTTATCAGCC 


AAAC CGTTC A 


600 
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AGACTCCTGC 


TGCCAACTCA 


TCACCTGTCA 


CAACTGCTGC 


AGTTGCATTT 


GATGAAATCA 


660 


AACGCTCTGC 


TAAGGCGTAA 


CCATCATCAT 


AGCTATATTT 


AGATTCAAAT 


ACCAAACCCT 


720 


C AC T AT AAGC 


GATTCCTGCT 


TTTTTCAAGG 


TTTCCTTGTA 


GCCAACTAAA 


CGAACCTTAC 


780 


CATTGATGTC 


ATCCACTAGC 


GGACCGCTAA 


CGAAAGCAAT 


ACGCTCATTT 


TCTTTAGCAA 


840 


GGTAACTCAC 


TGCATCAATT 


GTTGCTTGCT 


TATAGTCAAT 


ATTGACACTT 


GGCAACTGGT 


900 


GCTCAACATC 


GACAGTTCCT 


GCGAGAACAA 


TCGGAGTACG 


TGAACGCGAA 


AATTCTGAGC 


960 


GAATTTTATC 


TGTCAAGTGA 


TAACCCATAT 


AGATAATGCC 


ATCTACCTGC 


TTTGAAAAGA 


1020 


GGGTATTGAC 


AACAGAAACT 


TCTTTCTCGT 


TATCTTCATC 


GCTATTAGCT 


AGGACAATAT 


1080 


TGTACTTGTA 


CATTTCTGCA 


ATATCATCAA 


TCCCCTTAGC 


CAAACTCGAA 


AAATAAC CAT 


1140 


TGGTAATATT 


TGGAATCACG 


ACACCGACAG 


TGGTTGTCTT 


TTTACTTGCA 


AGACCACGCG 


1200 


CAACTGCATT 


TGGACGATAA 


TCCAAACGAT 


CAATTACCTC 


TAGCACTTTT 


TTACGGGTAT 


1260 


TCTCTTTTAC 


ATTTTTATTG 


CCATTGACCA 


CACGGCTGAC 


CGTCGCCATG 


GGAAACACCT 


1320 


GCTTCACGAG 


CGACATCATA 


AATGGTTACT 


GTATCATCTG 


CATTCATTCC 


TTTTCCTGTC 


1380 


CTTTCTATCT 


CCACACATTC 


TTTTACAAGT 


AGAAGTGCTG 


AATTGAAAGC 


TCTATATCTT 


1440 


ACTTACAAAA 


ATGAAGATGT 


GAAAATTTCG 


TTTTCATATT 


TCTACTTATT 


CCATTCTATC 


1500 


ACTAATTGTA 


AACACTTTCA 


AGTGTTTTTT 


GAAGATTGAT 


TGAAAAAATT 


TCATAGAAAA 


1560 


CCTAGGTTTA 


G 










1571 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1682 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



CTGACGTAAA 


AAAGATTTTC 


GGAAAAGTAT 


CATCATCTAT 


TTTAG AC CAT 


TTTCTTATAA 


60 


TAACCATTTT 


ATTTTTATTT 


GTCAAGGTCT 


TTGAATTCTT 


TCTTAAACAA 


GCCTTGTAAT 


120 


CTCTACTTTT 


GAAGAATTTA 


TTTTTCCTTA 


CTGACAAGAT 


TTGAGACGGT 


AGGAATCATT 


180 


GAAAATAACC 


TAGCCAACAT 


CAATCACAAT 


CATTTCTCCT 


TTCTCAATTA 


CACTAAATTA 


240 


TAGTGTATTG 


AATCTATAAC 


AGTGCACCTT 


GGCTGCTAAA 


ATATTTCTAT 


AAATTAATTT 


300 


GACTTTCCTG 


ATAGAGTTGT 


TCACATCTTA 


TTTCAATTCA 


CTATACTTTC 


CCTTATACTC 


360 


AATGAAAATC 


AAAGCGCAAA 


CTAGGAAGCT 


AGCCACAGGC 


TGCTCAAAGC 


ACTGCTTTGA 


420 


GGTTGTAGAT 


AAGAC TGACG 


AAGTCAGTTA 


CATATATCTA 


CGGCAAGGCG 


AAGCTGACGC 


480 


GGTTTGAAGA 


GATTTTCGAA 


GAGTATAAAG 


TTTGTTTCTG 


TATCTTTCAG 


AAAAATAAGG 


540 


TATACTGTAT 


GTAAACGATT 


TCAAAGGAGT 


CCAGTTATGG 


CAAAAACATT 


TTTTATTCCA 


600 


AATAAACAGA 


GCATTTTAGG 


AGAACAAGAG 


ATTTTGAATG 


CCAAGTCGAT 


CTTGGCTATG 


660 


ATGTAGTCTA 


TCTCCGTCAG 


CCTCTTAATC 


GTCTCGAGTA 


TATTGAGTGT 


GCGATAGTGG 


720 


GGCAATCACA 


ATTTCTTTTT 


AAGGTCAGTT 


ATGCTGATGG 


TCAAAAGGCT 


TACCGTGTCG 


780 


ATCTTCCTGA 


CCTACTAACA 


AAGAC AGACT 


GGC AG AT TAT 


CAAGTCATTT 


TTAGATGTTT 


840 
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TGCTTGCTTA 


T AC AG GG AC T 


GATATTGAAG 


GGCTAGATGG 


TTTTGATTTT 


GAAGCTTATT 


900 


m/~l/~1 TV TV /"I /""I 7V "A /""** 

TCCAAGCAAG 


TATTCAAGCC 


TATCTAGCAG 


ACCCTGTAGC 


TCGTTTTACG 


ATTTGCCAAC 


960 


bAAl 111 I AA 


1 LL 1 Al 11 ib 


TTTAGTCGTG 


AGAACTTGAA 


AAGCTTTTTA 


GAGGCAGATG 


1020 


GC 1 1 GGC 1 C A 


Gil IbAAbLb 


CGI GTGCGTG 


CGGTTCAAGA 


GACAGATGCC 


TAC TTTGCGA 


1080 


/—^ 7\ m np (~n rri rri 

vjAb; 111 1 1 


C 1 A 1 LAbbA 1 


PP5\PI\ A P"«P~« A A 

G GAG AAGG AA 


AAGTGCATGG 


CGTTTACCAT 


CTAGCTCAAG 


1140 


bAb 1 LAAbAL 


Atjl 1 1 lALLb 


apapa APPPrn 

AG AG AAC C G T 


TTGTTCCTGC 


AGCCTATATT 


GAGCGAATTG 


1200 


Cr 1 brbiA 1 AACrbr 


A APTPPAPfTiP 

AAbi 1 C C Ab* 1 G 


GGAGATTGAC 


TTGGTTCAAA 


TCACAGGAGA 


CGGCTCTAAA 


1260 


AP A AP A PT 1 

LLAbAAbAL 1 


A Tip" 1 A A rpp^ 1 P" 1 A rri 
AlCrAAlCCAl 


AGC 1 CGC 1 TG 


GACTATGCAA 


AATTCTTAGA 


GGTATTACCC 


1320 


LLA1L 11111 


APPAPPA AP^m 

AC C AC C AAC T 


TV /""I TV /"""l /*t /"t /""I TV TV m 

AGACGCCAAT 


CAAATAGAAA 


TACAACCCAT 


C C TAGG AC AA 


1380 


n A mrnmm a AAA 
bAi 1 1 lAAAA 


PArnffAPPAPA 

CA1 1AGCACA 


tv /-I tv tv tv tv m A A 

AGAAAAGTAA 


AGCAGAAGCA 


GGTCAATCGA 


CTTGCTTTTT 


1440 


TGACATAGAA 


AAAATCCTGC 


CAAGGATGAC 


AGGATTGCTA 


C TC AATG AAA 


ATCAAAGAGC 


1500 


AAACTAGGAA 


GCTAGCCGCA 


GGCTGTACTT 


GAGTACGGTA 


AGGCGAAGCT 


GACGTGGTTT 


1560 


GAATTTGATT 


TTCGAAGAGT 


ATGAATTTTA 


AAGAAAGGCC 


AAGATACGAA 


GATAATCTCC 


1620 


AATCAGTGCC 


ACTTCAGCTT 


CCAAGAAGAA 


GAAGATTATA 


ACTCCCGTTC 


CCCAAGGACA 


1680 


GA 












1682 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 041 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



ATCGAATTAA 


AAATGAGGTA 


TTCAGGCTTG 


TGATTTTCTA 


TGGAAGTTAA 


TAGTGATTGC 


60 


CTCTAATGCT 


TACAAGTGAT 


ATTAAAAATA 


GAGGACCTAG 


TGATGTCAAT 


CATTTCAACT 


120 


GATTTAACCC 


CTTTTCAAAT 


AGATGATACA 


TTGAAAGCAG 


CCTTGCGAGA 


AGATGTTCAT 


180 


TCCGAAGATT 


ACAGTACCAA 


TGCCATTTTT 


GATCATCATG 


GCCAAGCCAA 


GGTGTCGCTT 


240 


TTTGCCAAGG 


AAGCTGGTGT 


TTTAGCGGGG 


CTAACCGTTT 


TTCAAAGGGT 


TTTTACCCTA 


300 


TTTGATGCCG 


AGGTG AC C TT 


CCAGAATCCT 


CATCAATTTA 


AGGATGGGGA 


TCGTTTGACT 


360 


AGTGGCGATT 


TGGTTTTAGA 


AATCATAGGC 


TCGGTGAGAA 


GTCTCTTAAC 


ATGTGAACGC 


420 


GTTGCCTTGA 


ATTTTTTACA 


ACATTTATCA 


GGGATCGCTT 


CGATGACAGC 


TGCTTATGTA 


480 


GAAGCCTTAG 


GCGATGATTG 


CATTAAGGTA 


TTTGATACTC 


GAAAAAC TAC 


TCCTAATTTA 


540 


CGTCTTTTTG 


AGAAATATGC 


CGTGAGAGTT 


GGCGGTGGCT 


ATAATCATCG 


CTTTAATTTA 


600 


TCAGATGCTA 


TCCTGCTAAA 


AGACAATCAC 


ATTGCGGCAG 


TAGGTAGTGT 


TCAAAGGGCA 


660 


ATTGCTCAAG 


CGCGTGCCTA 


TGCTCCTTTT 


GTGAAAATGG 


TCGAGGTGGA 


AGTGGAAAGC 


720 


CTTGCTGCTG 


CCGAAGAAGC 


TGCGGCGGCG 


GGTGCTGATA 


TTATCATGTT 


GGATAATATG 


780 


TCATTGGAAC 


AGATTGAACA 


GGCCATTACC 


CTAATTGCAG 


GACGTTCTCG 


GATTGAATGT 


840 


TCTGGAAATA 


TTGATATGAC 


CACTATTAGC 


CGTTTTCGTG 


GTTTAGCGAT 


TGATTACGTC 


900 


TCCAGTGGTA 


GTTTAACCCA 


TAGTGCTAAG 


AGTCTTGATT 


TTTCCATGAA 


GGGTTTAACC 


960 
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m tv vi vi m m tv m vi 

TACCTTGATG 


TCTAAGTTGT 


AAAATAAACT 


AACTTTTTAA 


AGGATGTCTT 


TCCTCTAGAA 


1020 


CGAGTTTTAT 


m vi tv vi tv m tv vi m 

GTCAGATAGT 


TTAAACGCCT 


CTTCAAATAT 


AGTAAAATGA 


ACCAAAAATA 


1080 


blALALAA 1 G 


Tippm a m a a mo 
1 GG 1 A TAAT. C 


TTCTTATGGC 


tv m tv mmvi tv tv m t\ 

ATATTCAATA 


GATTTTCGTA 


AAAAAGTTCT 


1140 


TTCTTATTGT 


O A OOO A A O A O 

GAGCGAACAG 


G T AGT AT AAC 


7\ Vi TV TV VI VI TV m VI TV 

AGAAGCATCA 


CACGTTTTCC 


AAATCTCACG 


1200 


m a a m a oo a mm 
1AAIALLA1 1 


m a mo o o mo o m 
1 A 1 GGC TGGT 


T AAAG C T AAA 


TV VI TV VI TV TV TV TV VI TV 

AGAGAAAACA 


GGAGAGCTAA 


ACCACCAAGT 


1260 


a aaaooaaoa 
AAAALjGAACA 


AAA OO A A O A A 

AAAL C AAG AA 


a a ommo a m a o 

AAGTTGATAG 


Tv v* tv m tv vi tv vi mm 

AG ATAGAC TT 


AAAAAC T ATC 


TTACTGACAA 


132 0 


mo o a O A oo f~**rry 
1 CCAGACGC 1 


1AX 1 1GACTG 


tv tv 7v m tv vi vi m m vi 

AAAT AGC T T C 


TGAATTTGGC 


TGTCATCCAA 


CTACCATCCA 


13 80 


OmAmOOOOmO 

C I A1GCGC 1C 


a a a o o m a mo o 
AAAGG 1 A I GG 


GCTACACTCG 


TV TV TV TV TV TV VI VI TV Vi 

AAAAAAGGAC 


CACACCTACT 


ATGAACAAGA 


1440 


/~i O O A O A A A A A 

CCCAGAAAAA 


/-i rn a ooomrn a m 

GTAGGCTTAT 


mmnmrnTi tv tv tv tv 

TTCTTAAAAA 


TTTTAATAGT 


TTAAAGCACC 


TAGCACCTGT 


1500 


1 1AGAI IGA1 


OA A A O A OO A m 

GAAAGAGGAT 


TCGATACTTA 


mm mmm tv mvivi tv 

TTTTTATCGA 


GAATATGGTC 


GCTCATTAAA 


1560 


a /'■"•/tpo a o mm a 
AGG I LAG 1 1 A 


ATAAGAGGTA 


tv tv vi m tv iti/ H im/i/*i 

AAGTATCTGG 


AAGAAGATAT 


CAGAGGATTT 


CTTTGGTTGC 


1620 


AGG T C T AAC A 


7V tv mvi vtmvi tv vi m 

AATGGTGAGT 


TAATCGCTCC 


AATGACTTAC 


GAAGAGACGA 


TGACGAGCGA 


1680 


CTTTTTTGAA 


vi vi t\ m/i/^ mmm vi 

GCATGGTTTC 


AGAAGTTTCT 


CTTACCAACA 


TTAACCACAC 


CATCGGTTAT 


1740 


TATTATGGAT 


TV TV mvi Vi T\ TV VI tv m 

AATGCAAGAT 


TCCATAGAAT 


GGGTAAGTTA 


GAACTTTTAT 


GCGAGGAGTT 


1800 


m VI VI VI Vt 7\ n~t TV TV TV 

TGGGCATAAA 


CTTTTACCTC 


TTCCTCCCTA 


CTCGCCTGAG 


TACAATCTTA 


TTGAGAAAAC 


1860 


7v m V* Vi Vi Vi m vi *7\ m 

ATGGGCTCAT 


TV rn vi TV TV TV TV TV V* /^l 

ATCAAAAAGC 


ACCTCAAAAA 


GGTATTAC C A 


AGTTGCAATA 


CCTTTTATGA 


1920 


GGCTCTTTTG 


m vi v* mvi vi rt"i vi rri rn 

TCCTGCTCTT 


GTTTCAATTG 


ACTATAGTTC 


AC GG AT AC AG 


TTGGGAAAGA 


1980 


7v ArnrnTv tv tv m^m 

AGTTAAATGT 


AGTTGGATTT 


C C AC T AAAGG 


TTGATGAGTA 


AGTTTTTGTA 


TCTGAACCTG 


2040 


ATTGGCCGCA 


AGCAGCTAAA 


AGCAAAGCAG 


ATGCAAAAGT 


C AG AC C TGC A 


CCAAGGACAC 


2100 


GCTTCTTTAT 


vimmvi t\ mvimmv* 

GTTCATCTTC 


TTTCTCCTTA 


ATAGTGGGAA 


TTTGTAAAGT 


TAATTGAATT 


2160 


TCAAGAATGA 


tv vi mmm m tv m tv 

AGGTTTTATA 


AACTTTGGTT 


ATAAAAAACA 


AAGGATTTCT 


GTCTTTTATA 


2220 


CAGTCCTCCC 


CTTGTTTTTA 


TACGATTTCA 


ATTTTAAATT 


TTTCTGCAAA 


AAATATTTAT 


2280 


AGTAATTCCA 


/"-I T\ Vi 7\ Vi TV TV TV VI VI 

CACAGAAAGC 


ATCCCATGGA 


ACTAAGATTT 


GTTTTTCAAA 


GACTTCTTGA 


2340 


GCTAGGGTGT 


TTTCAATCAA 


GACAGATTTG 


ACTTTTCCTT 


CTACTGTCAA 


GTCTTGCTCT 


2400 


mvi 7v mmvi vi *a vi t\ 

TCATTGGACA 


TV VI m m TV /^/t/^i > r~^\ 

AGTTAGCCAC 


AACTAGGAAG 


CGACGGTCGC 


CATCCTTACG 


TATATAAGCA 


2460 


7V 7\ /™i 7\ vi vtmm TV m 

AAGACCTTAT 


VI TV VI VI VI Vi m Tv m Vi 

CAGCCGTATC 


AAGCAATTCA 


AAGTCAGCTC 


GAATTAGCCA 


ACTATTCTCC 


2520 


TTGCGAATTT 


GGACCAGTTT 


CTGATAGGTA 


TAGAAAATAG 


AATCTGGATT 


TGCCAGCGCT 


2580 


m vi m m vi vi t\ v<vim 

ICTTGGACGT 


mvi tv m vi tv mvim/*i 

TGATCATCTC 


GTAATTTGGA 


TTAACTGCCA 


AC C AAGGTTG 


ACCTGTTGAG 


2640 


T\ T\ TV Vi vi "A Vi Vi vi m 

AAACCAGCGT 


TTTTGCTCTC 


GTCCCATTGC 


ATAGGGGTAC 


GGGCATTGTC 


ACGTCCAATA 


2700 


A O A OO O A rp A O 

ALALbbA 1 AC 


mo moo a mo a m 

TGTL C ATGAT 


TTCTTGCATC 


GGAACACCTT 


TTTCAAGAGC 


CTCACGCGCA 


2760 


TAGTTGAGAG 


ATTCAATATC 


TTCTACTTGA 


TCCAGTGTTT 


CAAACGGATA 


GTTGGTCATC 


2820 


CCAATCTCCT 


CACCTTGGTA 


GATATAAGGA 


GTTCCTCTCA 


TAAGATGAAG 


CAAGATTGCA 


2880 


AAGGCTTTGG 


CAGATTTTTC 


GCGGTATTCT 


TGGTCATTTC 


CCCAGATTGA 


GACAATACGA 


2940 


GGGAGGTCAT 


GGTTGTTCCA 


GAAGAGGGAA 


TTCCAGCCGT 


CCTCAACTCC 


TAACTCTGTC 


3000 


TGCCATTTGT 


TGAAGATTTC 


TTTTAACTTA 


GCGATATTCA 


G 




3041 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4694 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 



T T AAT T T AAA 


T TC T T AAAAT 


TTTTTCATAA 


m 7\ tv m m /t. /t. m 

TAATCTCCCT 


TV m TV TV TV TV TV m TV TV 

ATAAAAATAA 


tv Am/^n/nriim\ tv 

AGTCGCCCAA 


6 0 


1 GAGGGGGG 1 


rp 7V rprnrnrnrnrrirpr** 1 

lAl 11111 1G 


A A A A A rp f~* 

AAAAA 1 GGGG 


1 TGGTGCCTG 


TV TV TV rn TV TV TV m TV 

AGAATAAATA 


GCTTAGTGAT 


12 0 


AG AAGAAAA 1 


GGGGAAA 1 A 1 


/ — i m 7v m tv tv m/"' tv 

GG i AT AAT. G A 


7\ 7V /—i tv rn tv s~*\ tv rn 

AAC GAT AG AT 


TTTTGAATAG 


GAATAAGATC 


180 


ATGTTTGGAT 


TTTTTAAGAA 


A (~* A m A A A /~> f~~* f* 

AG A 1 AAAGG G 


mr*'m( r i/^i tv tv /~t m tv 

1 GTGGAAGTA 


tv /~i mm v-^ tv 

GAGGTTCCGA 


CACAGGTTCC 


2 40 


m/^t/"~«m<^i iv m tv m/~i 

TGCTCATATC 


/—I s~% /—\ 7» fT\f*^ TV m/~1 TV 

GGCATCATCA 


m/— i /"■*« tv m/^i/^ tv "a 

TGGATGGCAA 


TGGCCGTTGG 


GCTAAAAAAC 


GTATGCAACC 


300 


GCGAGTTTTT 


GG AC AT AAG G 


CGGGCATGGA 


AGCATTGCAA 


ACCGTGACCA 


AGGCAGCCAA 


360 


/~i tv tv tv yi rn 

CAAACTGGGC 


v*-i m /-t. TV TV /^i /""i m m 7\ 

GTCAAGGTTA 


TTACGGTCTA 


TGCTTTTTCT 


ACGGAAAACT 


GGACCCGTCC 


420 


AGATCAGGAA 


nmrtu tv /^immmTV 

GTCAAGTTTA 


TCATGAACTT 


GCCAGTAGAG 


TTTTATGATA 


ATTATGTCCC 


480 


GG AAC T AC AT 


/—I j""*"! TV TV m TV t\ mo 

GCGAATAATG 


TTAAGATTCA 


AATGATTGGG 


GAG AC AG AC C 


GCCTGCCTAA 


540 


GCAAACCTTC 


>^-f *n tv o/^mmmn tv 

GAAGCTTTAA 


CCAAGGCTGA 


GGAATTGACT 


AAGAACAACA 


CAGGATTGAT 


600 


TCTTAATTTT 


GCTCTTAACT 


ATGGTGGACG 


TGCTGAGATT 


ACACAGGCGC 


TTAAGTTGAT 


660 


TTCCCAGGAT 


GTTTTAGATG 


CCAAAATCAA 


CCCAGGTGAC 


ATCACAGAGG 


AATTGATTGG 


720 


TAACTATCTC 


TTTACCCAGC 


ATTTGCCTAA 


GGACTTACGA 


GACCCAGACT 


TGATTATCCG 


780 


TACTAGTGGA 


GAATTGCGTT 


TGAGCAATTT 


CCTTCCATGG 


CAGGGAGCCT 


ATAGTGAGCT 


840 


TTATTTTACG 


GACACCTTAT 


GGCCTGATTT 


TG AC G AAG C G 


GCCTTGCAGG 


AAGCTATTCT 


900 


TGCCTATAAT 


CGTCGCCATC 


GCCGATTTGG 


AGGAGTTTAG 


GAGGAAATAT 


G AC C C AGG AT 


960 


T T AC AG AAAA 


GAACCTTGTT 


ATGCAGGGAT 


TGCCCTGACT 


ATTTTCCTAC 


CAATTTTAAT 


1020 


GATTGGGGGC 


TCTTGCTTCA 


GATAGCAATC 


GGAATCATAN 


CCATGCTAGC 


CATGCATGAA 


1080 


CTTTTGAAGA 


TGAGAGGTCT 


AG AG AC C ATG 


ACGATGGAGG 


CCTCTTGACC 


CTCTTTGCAC 


1140 


NTTNGTATTG 


tv /—I tv mm y^i 

ACCATTCCCC 


m i /— *i tv ta m i*"""* TA TV 

TGGAATCGAA 


TTACCTGACT 


TTTTTGCCAG 


TTGATGGGAA 


1200 


TGTGGTTGCC 


TATAGTGTTT 


TGATTTCAAT 


CATGTTAGGA 


ACGACCGTTT 


TTAGCAAGTC 


12 60 


T TAT AC G ATT 


GAGGATGCGG 


TTTTCCCTCT 


TGCTATGAGC 


TTCTACGTGG 


GCTTTGGATT 


132 0 


TAATGCTTTA 


CTAGATGCTC 


rn/"i rnrn/*"** tv 

GTGTTGCAGG 


TTTGGACAAG 


GCTCTCTTAG 


CCTTGTGTAT 


13 80 


CGTCTGGGCG 


t\ /—i tv tv /"i tv /^irn/n 

ACAGACAGTG 


/^i m/""i /"i /—"i m t\ m/im 

GTGCCTATCT 


TGTTGGGATG 


AAC TAT GGG A 


AACGAAAGTT 


1440 


AGCACCAAGG 


GTATCGCCTA 


tv m tv Tv tv 7v /"i /""i <"i rn 

ATAAAACCCT 


TGAGGGTGCC 


TTGGGTGGTA 


TTTTAGGAGC 


15 0 0 


7\ 7\ rnrrirnrn 7\ m 7\ 

AA 1 1 1 i AG J. A 


AG G A 1 1 A 1 G 1 


TT AT GAT AG T 


mp a a f* m a a 

1 GACAGTACA 


GTTGCTCTTC 


/~t tv m tv m/~t /^i tv tv m 

CATATGGAAT 


156 0 


1 lALAAbAlb 


rp t\ /"^ rn/^ rnrprn/'*i 

1 GAG 1 G 1 1 1 G 


G1A1111G11 


rn a /*** o A mmp r«m 

1 AGG A TTGCT 


/"i /-i tv tv tv mmm/"i 

GGACAATTTG 


/~im<^i tv mmm tv /— i m 

GTGATTTACT 


iron 

162 0 


AGAAAG I 1 GG 


A 1 G AAAGG 1 G 


Al 1 1 1GG1G 1 


rp a A r~*r~* a mmpm 

1 AAGGA 1 TC T 


rinriTi tv tv m mm tv 

GGGAAATTTA 


TCCCTGGACA 


iron 
1 D O U 


1 GG X GG 1 G 1 1 


rr~ir7~i/^i /~i tv rp rp rn 

1 1GGA1 GG 1 I 


ftipPArpAPTAm 

1 GGA1 AG 1 Ai 


prnmp prprnp m a 

Gl 1GGTTGTA 


mmm/~i/~i tv tv m ti 

TTTCCAATCA 


TGCACTTATT 


174 0 


1 GGAG 1 G 1 1 1 


rp A AmpA A A AP 

1 AA 1 G AAAAG 


Ar"* , fT"*Ar~ 1 OA A A 
AG GG AGG AAA 


GGG 1 A1GGTC 


s~-\ tv tv mm mm tv tv 

GGAATTTTAA 


CCTTTATTCT 


180 0 


GGTTTTTGGG 


ATTATTGTAG 


TGGTGCACGA 


GTTCGGGCAC 


TTCTACTTTG 


CCAAGAAATC 


1860 


AGGGATTTTA 


GTACGTGAAT 


TTGCCATCGG 


TATGGGACCT 


AAAATCTTTG 


CTCACATTGG 


1920 


CAAGGATGGA 


ACGGCCTATA 


CCATTCGAAT 


CTTGCCTCTG 


GGTGGCTATG 


TCCGCATGGC 


1980 


CGGTTGGGGT 


GATGATACAA 


CTGAAATCAA 


G AC AGG AAC G 


CCTGTTAGTT 


TGACACTTGC 


2040 


TGATGATGGT 


AAGGTTAAAC 


GCATCAATCT 


CTCAGGTAAA 


AAATTGGATC 


AAACAGCCCT 


2100 


CCCTATGCAG 


GTGACCCAGT 


TTGATTTTGA 


AGACAAGCTC 


TTTATCAAAG 


GATTGGTTCT 


2160 


GGAAGAAGAA 


AAAACATTTG 


CAGTGGATCA 


CGATGCAACG 


GTTGTGGAAG 


CAGATGGTAC 


2220 


TGAGGTTCGG 


ATTGCACCTT 


TAGATGTTCA 


ATATCAAAAT 


GCGACTTTAT 


CTGGGGCAAA 


2280 


CTGATTACCA 


ATTTTGCAGG 


TCCTATGAAC 


AATTTTATCT 


TAGGTGTTGT 


TGTTTTTTGG 


2340 
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GTTTTAATCT 


TTATGCAGGG 


TGGTGTCAGA 


GATGTTGATA 


CCAATCAGTT 


CCATATCATG 


2400 


CCCCAAGGTG 


CCTTGGCCAA 


GGTAGGAGTA 


CCAGAAACGG 


CACAAATTAC 


CAAGATTGGC 


2460 


TCACATGAGG 


mm tv ootv tv omo 

T T AGC AAC TG 


/~"*/"^ "tv tv tv /^i /— t m m /~i 

GGAAAGCTTG 


ATCCAAGCTG 


TGGAAACAGA 


AACCAAAGAT 


2520 


TV TV /-I T\ OOZ" \ r~\ TV /""I 

AAGACGGCAC 


CGACTTTGGA 


TGTGACTATT 


TCTGAAAAGG 


GGAGTGACAA 


ACAAGTCACT 


2580 


/~i mm tv o tv /"-t o /~i 0 

GTTACACCCG 


tv tv 0 tv m tv omo tv 

AAGATAGTCA 


Tv y^i /^i m /*"t mm tv 

AGGTCGTTAC 


CTTCTAGGTG 


TTCAACCGGG 


GGTTAAGTCA 


2640 


GATTTTCTAT 


CCATGTTTGT 


AGGTGGTTTT 


ACAACTGCTG 


CTGACTCAGC 


TCTCCGAATT 


2700 


/-"t m m /—i 7\ /~*% m /~t 

CTCTCAGCTC 


TG AAAAATC T 


GATTTTCCAA 


CCGGATTTGA 


ACAAGTTGGG 


TGGACCTGTT 


2760 


GCTATCTTTA 


TV /"I f \ TV TV OIT1 TV /""I 

AGGCAAGTAG 


TGATGCTGCT 


AAAAATGGAA 


TTGAGAATAT 


TCTTGTACTT 


282 0 


/""» m m o f* o t\ n mo 

CTTGGCAATG 


tv mmmrT' tv mo tv 

ATTTCCATCA 


Tv rn Tv m m /~i y*^ tv m 

ATATTGGGAT 


TTTTAATCTT 


ATTCCGATTC 


CAGCCTTGGA 


2880 


TGGTGGTAAG 


tv mmo mo/~imo tv 

ATTGTGCTCA 


Tv m tv my~i m tv /•— i tv 

ATATCCTAGA 


AGCCATCCGC 


CGCAAACCAT 


TGAAACAAGA 


2940 


71 TV m TV 7V TV /■**! 

AATTGAAACC 


TATGTCACCT 


TGGCCGGAGT 


GGTCATCATG 


GTTGTCTTGA 


TGATTGCTGT 


3000 


GACTTGGAAT 


GACATTATGC 


GACTCTTTTT 


TAGATAATCG 


AGGAATATTA 


TGAAACAAAG 


3060 


TAAAATGCCT 


ATCCCAACGC 


TTCGCGAAAT 


GCCAAGCGAT 


GCTCAAGTTA 


TCAGCCATGC 


3120 


TCTTATGTTG 


CGTGCTGGTT 


ATGTTCGCCA 


AGTTTCAGCA 


GGTGTTTATT 


CTTATCTACC 


3180 


ACTTGCCAAC 


CGTGTGATTG 


AAAAAGCTAA 


AAACATCATG 


CGCCAAGAAT 


TCGAAAAGAT 


3240 


TGGTGCTGTT 


GAGATGTTGG 


CTCCAGCCCT 


TCTTAGTGCA 


GAATTGTGGC 


GTGAATCAGG 


3300 


TCGTTACGAA 


ACCTATGGTG 


AAGACCTTTA 


C AAAC TG AAA 


AACCGTGAAA 


AATCAGACTT 


3360 


TATCTTAGGT 


CCAACTCACG 


AAGAAACCTT 


TACAGCTATT 


GTCCGTGATT 


CTGTTAAATC 


3420 


TTACAAGCAA 


TTGCCACTCA 


ACCTTTATCA 


AATTCAGCCC 


AAGTATCGTG 


ATGAAAAACG 


3480 


CCCACGTAAT 


GGACTTCTTC 


GTACACGTGA 


GTTTATCATG 


AAGGATGCTT 


ATAGTTTCCA 


3540 


CGCTAACTAT 


GATAGTTTGG 


ATAGTGTTTA 


TGATGAGTAC 


AAAGCAGCCT 


ATGAGCGTAT 


3600 


TTTCACTCGT 


AGTGGTTTAG 


ACTTCAAGGC 


TATTATTGGT 


GACGGTGGAG 


CCATGGGTGG 


3660 


TAAGGATAGC 


CAAGAATTTA 


TGGCCATTAC 


ATCTGCTCGT 


ACAGACCTTG 


ACCGCTGGGT 


3720 


TGTCTTGGAC 


AAGTCAGTTG 


CCTCATTTGA 


CGAAATTCCT 


GCAGAAGTGC 


AAGAAGAAAT 


3780 


CAAGGCAGAA 


TTGCTCAAAT 


GGATAGTCTC 


TGGTGAAGAT 


ACCATTGCTT 


AC TC AAGTG A 


3840 


GTCTAGCTAT 


GCAGCTAACT 


TAGAAATGGC 


AACAAACGAG 


TAC AAAC C AA 


GCAACCGTGT 


3900 


TGTCGCTGAA 


GAAGAAGTTA 


CTCGTGTTGA 


AACGCCAGAT 


GTTAAATCAA 


TTGATG AAGT 


3960 


TGCAGCCTTC 


CTCAATGTTC 


CAGAAGAACA 


AACGATTAAA 


ACCCTCTTCT 


ACATTGCAGA 


4020 


TGGTGAGCTT 


GTTGCAGCCC 


TTCTAGTTGG 


AAATGACCAA 


CTCAACGAAG 


TCAAGTTGAA 


4080 


tv tv tv m o t\ ommo 

AAATCACTTG 


fy /"i tv f~\ tv tv tv mm 

GGAGCAAATT 


TCTTTGACGT 


TGCTAGCGAA 


GAAGAAGTGG 


CGAATGTTGT 


4140 


TCAAGCAGGA 


TTTGGTTCAC 


TTGGACCAGT 


TGGTTTGCCA 


GAGAATATTA 


AAATTATTGC 


4200 


TV t\ m/^i/^i m TV TV /"% 

AGATCGTAAG 


GTGCAAGATG 


TTCGCAATGC 


AGTTGTCGGT 


GCTAACGAAG 


ATGGCTACCA 


4260 


CTTGACTGGT 


Om/"1 TV TV /*tOO TV 

GTGAACCCAG 


GCCGTGATTT 


TAC TGC AGAA 


TATGTGGATA 


TCCGTGAAGT 


4320 


1 CGTGAGGGT 


O TV TV TV 1 1 11 1 1 m 0 /-1 /""i 

GAAATTTCCC 


f \ TV TV m/-1 TV ^-1 TV 

CAGATGGACA 


AGGTGTCCTT 


AACTTTGCGC 


GTGGTATTGA 


43 80 


/""i 7v rn /~i /~i m r~*\ tv m 

GATCGGTCAT 


tv rnrnmm/""* tv tv tv 0 

ATTTTCAAAC 


TCGGAACTCG 


CTATTCAGCA 


AGCATGGGAG 


CAGATGTCTT 


4440 


GGATGAAAAT 


GGTCGTGCTG 


TGCCAATCAT 


CATGGGATGT 


TACGGTATCG 


GTGTCAGCCG 


4500 


TCTTCTTTCA 


GCAGTGATGG 


AGCAACACGC 


TCGCCTCTTT 


GTTAACAAAA 


CGCCAAAAGG 


4560 


TGAATACCGT 


TACGCTTGGG 


GAATCAATTT 


CCCTAAAGAA 


TTGGCACCAT 


TTGATGTGCA 


4620 


TTTGATTACT 


GTTAATGTCA 


AGGATGAAGA 


AGCGCAAGCC 


TTGACAGAAA 


AACTTGAAGC 


4680 


AAGC TTGATG 


GGAG 










4694 



(2) INFORMATION FOR SEQ ID NO: 48: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 



prnpprn7\ 7\ /Trim 

CI CG rAACjT r 


C GG AAGC TAT 


f\TT\ 7V /~t T\ /""I TV TV TV 

CTACACAAGA 


AATTAACCGC 


TGCCTAAAGG 


AGAAGCCATG 


60 


1 LAALA 1 A 1 A 


AC 1 GGGATG A 


y*-* tv tv /"i /—i tv m TV m /^t 

GAAGCATATC 


CTTACCTTTC 


CTGAAGAAAA 


AGTAGCCCTT 


120 


TCTACTAAGG 


ATGTCCATGT 


TTACTATGGT 


AAAAATGAAT 


CCATTAAGGG 


GATTGATATG 


180 


CAATTTGAAA 


/^i tv tv tv m Tt 7t > ti m 

GAAATAAAAT 


TACAGCTTTG 


ATTGGTCCGT 


CGGGATCGGG 


GAAATCTACC 


240 


m tv rirnm tv /*"t tv 

TACTTACGCA 


GTCTCAATCG 


CATGAATGAT 


ACCATTGATA 


TTGCTAAAGT 


AACTGGGCAG 


300 


ATTCTCTATC 


GTGGAATTGA 


TGTCAACCGT 


C C AGAAATC A 


ACGTTTATGA 


AATGCGTAAA 


360 


CACATTGGAA 


TGGTTTTTCA 


ACGCCCCAAT 


CCATTTGCTA 


AATCGAATTT 


ACCGTAATAT 


420 


TACCTTTGCG 


CATGAACGTG 


CTGGAGTTAA 


GGATAAGCAA 


GTCCTAGATG 


AAATCGTAGA 


480 


A 7\ /"i /~» m /~t /"■i /~i m m 

AACCTCCCTT 


AGTCAGGCTG 


CCCTTTGGGA 


TCAGGTTAAA 


GACGATCTCC 


ACAAGTCAGC 


540 


CTTGACCTTA 


TCAGGTGGTC 


AGCAACAACG 


TCTCTGTATC 


GCTCGTGCCA 


TCTCTGTTAA 


600 


GCCAGATATC 


CTCTTAATGG 


ATGAGCCAGC 


CTCAGCCTTG 


GATCCGATTG 


CGACCATGCA 


660 


ACTAGAAGAG 


ACCATGTTTG 


AGCTCAAGAA 


AAACTTTACC 


ATCATCATTG 


TAACGCATAA 


720 


TATGCAGCAG 


GCTGCTCGTG 


CAAGTGACTA 


TACAGGCTTC 


TTTTACTTGG 


GTGATTTGAT 


780 


TGAGTATGAC 


AAGACTGCAA 


CTATTTTCCA 


AAATGCCAAG 


CTACAGTCCA 


CCAATGACTA 


840 


TGTATCTGGT 


CACTTTGGTT 


AGAAAGGAAA 


CCGTATGACA 


GATGCGATTT 


TACAGGTATC 


900 


AGACCTGTCC 


GTTTATTATA 


ATAAAAAGAA 


GGCTTTGAAT 


AGTGTTTCCC 


TATCTTTCCA 


960 


ACCTAAGGAA 


ATTACAGCCT 


TGATTGGTCC 


ATCTGGATCA 


GGGAAGTCAA 


CCCTCCTCAA 


1020 


GTCTCTCAAC 


CGCATGGGAG 


ATCTCAATCC 


AGAGGTGACC 


ACAACTGGAT 


CCGTGGTGTA 


1080 


CAATGGTCAC 


AACATCTACA 


GTCCGCGTAC 


AGATACGGTT 


GAATTACGTA 


AGGAAATCGG 


1140 


AATGGTTTTC 


CAACAACCTA 


ATCCTTTCCC 


TATGACTATC 


TATGAGAATG 


TTGTCTACGG 


1200 


GCTTCGTATC 


AATGGAATTA 


AGGATAAGCA 


GGTTCTGGAT 


GAAGCCGTAG 


AAAAAGCCTT 


1260 


GCAAGGTGCC 


TCTATCTGGG 


ATGAGGTCAA 


GGATCGTCTA 


TATGATTCAG 


CTATTGGATT 


1320 


GTCAGGTGGT 


CAACAGCAGC 


GTGTCTGCGT 


GG 






1352 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 5 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
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a a pmmpp &cc 


prpp ATA A ApA 
b X bA 1 AAAb A 


AbjL 1 LjALjL 1 1 


rpp A p a m a pmm 

1 bALATACTT 


m TV TV TV jr-i Pi 

GTAGCCAACC 


m TV TV TV TV S~~% f~~% m 

TAAAAGCCGT 


60 


r vc r p r vr i a 7\czac 


prpp A A A PP A P 


prnpp a A prpp A 
t 1 bLAAL 1 LA 


TirnpApp a APP 

1 1 LAbbAAbL 


prnpp a rnp aaa 

L rbGATGAAA 


a mp a t\ /~i mp/^p 

ATGAAGTGGC 




Tdco a & r va r w 


A AAPPAPAP 


p A p A A prpp AT 1 
bAbAAL 1 LA 1 


PAPA APA A prp 

LALAAbAAL 1 


PA apa A A mmp 

bAAbAAA P TL 


p a mmmp a a p m 

LATTTGAAGT 


1 0 u 


X t\ X ^i-Vrt.^ii^_rVr\ 


PAAAATPPTA 


r-i 1 LUtriVjU X Vjj 


cn^r^ 7\ A A A rp 
b 1 b Abb AAA 1 


a mm a mp a p a p 
A 1 1 A 1 LAL Ab 


P A PP A PmP A A 

CAGGAbTLAA 


z 4L) 




A prpp a rnm a p A 


TPTPTP T 1 A P rp 


P A r^TT^ A A A A rp 
b AL 1 bAAAA 1 


PPAAAAAPAA 

bbAAAAALAA 


PAPA A APA P m 

L AGAAAC AG T 


T A A 

jUU 


PrTTPATAPr 
\ — J- X J. rTk^jrv^ 


PAf^riTAAPP A 
^Jt\\D\3 X tt-rt\ v_-.rt 


A APA Aprprn a rn 
/\/\br/\Abi 1 1 A 1 


AAAPPA AP rpp 

AAA L LAAb 1 b 


pmmp a a pmmp 
b 1 1 bAAbTTG 


P P/~i/~im/~iPm/ , "*»m 

GCGCTCCTGT 


3 60 


AAfTPAPAAP 


PP^HA^nA A A 


r^TT^ip t 1 p rp rn r 1 

Vj X bx^j 1L1 1 bib 


APPA A P rn A P rp 

ALLAAL 1 AL 1 


P A ppm A A A A P 

b Abb 1 AAAAL 


pm apa prnpp a 

L T AGAC TGG A 


42 0 


TATPPAAPA A 


p A An A A ATTT* 


p A rprprp a PP A P 
bAl 1 1ALLAL 


Ab 1 bAL 1 L b X 


P» A A A A m/"*1P A P 

GAAAATCCAC 


m/^mmTv m /^i tv tv 

TCTTACTCAA 


480 


A fi<^ A A A A A C* A 


p A A prpp A rprp a 


b 1 AAbjbjbr 1 bi 1 


P A A rnpp apa rn 

L AA 1 bbACAT 


CGTAGCAACT 


TCTACTCTGT 


540 


PAPPaHTTTT 
bAbb AL i it i 


bbLbA 1 bb 1A 


A P P A A P rpp A A 

AbrbjAAbi 1 brAA 


A A /""» A Pmmp m "a 

AALALTTGTA 


AATAGTGTCG 


TAGCACAGGA 


600 


A CCCCW ACT 
/ibbbbl XAb 1 


PA A A T 1 AP""~PPP 
bAAAl Ab iLb 


A a p n^r^r^r^ a a p 
AAbr 1 LbibjAAL 


m a mpp m a a p a 

1 A 1 bb 1 AAL A 


CATGTAGGCG 


ATGAAAACGG 


660 


/\b AAbbbbb 1 


A rprppprpr' A A P 
All bb 1 bAAb 


A A A A A PP AAA 

AAAAALLAAA 


ACTAGAAATC 


CTAAGCCAAC 


CAGCTCCTGC 


720 


1 bAbbAAAbb 


A A A PPrpprnrnp 
AAAbb 1L 1 iL 


C 1 LAAbrATCC 


7\ m /^i m ppi 

AGCTCCTGTG 


GTAATAGAGA 


AAAAACTTCC 


780 


1 bAAAb AbbA 


A r^T^r^ A PP A rprp 

AL 1 L ALbA 1 1 


L 1 bjb. AbjCjCjAC 


rAGTAGTCGC 


AGGACTCATG 


GCCACACTAG 


840 


b AbL L1A1 bb 


AprppAprpA A A 

Ab 1 LAL 1 AAA 


AbAAAAGAAG 


ACTAAGTCTT 


TTCGATAAAA 


AATAAACAGC 


900 


PAPA 1 ! II | lp A A P 
bAbA 1 1 b AAb 


r°p /~» /-"i prnpmrnrn 
L 1 Lbb Ibl 1 I 


A 1 1 1 1 1 1 AA 1 


TAATC AC CTA 


GTCCAAGACG 


TTCAAAGATA 


960 


rnp A mpp 7\ pirn pi 

1 LAI LLAL 1 L 


prprprp/^/^rTi/™irn 7\ 
b 1 1 1 bb 1 b 1 A 


A rp A A A /^mOO/**' 

A 1 AAAb 1 bibjG 


TTGAAGATTT 


CATCGATTTC 


TTCTTGTGTG 


1020 


APA PPffip A Tip 
AbALb 1 bA X b 


rnrri 7\ rn rn rn/"" 1 t\ 

1 1 AC 1 1L1 b A 


A rnpmpr<pmp a 

AlLTbbCTCA 


AGAAGTGGTT 


TAAAGTCTAC 


TTGGTTGTCC 


1080 


PA APAPTIAPP 
b AAbAb 1 Abb 


rn/""^ rnrnrnrnrn/'^ 1 
Lib! 1111 bb 


1 lbrLACCAAG 


TCATAGGCTT 


GCTCACGGGT 


CATGCCTTTT 


1140 


rpp A A mp a A T>P 
1 LAA 1 b AA 1 b 


rnp a APArpAPP 

1 LAALA 1 Abb 


PPPTirnpprimTi 
LLbil lbibjb TA 


AAGATAAGAC 


CAAAAGTCGA 


GTTCATGTTT 


1200 


ppp a mp A T 1 A rp 
b bbA 1 LA 1 A i 


rnrnrnr^rnc^ P p A A 
1 1 1L1 bbbAA 


P A /~"rp/~i mH'Ti AP 

biAb 1 br 1 C AAbr 


TTlTTGACGA 


TATTTCCAAA 


ACGGTTGAGC 


1260 


A rpprpTi prpp a A 
A 1 b 1 Ab X b AA 


rpp a A A A <~Pr~T^n~i 

1 LAAAA 1 bb 1 


pprp a <~nfTr\f~*f~*rn 
LblAlL lbibi 1 


GTGATGATAC 


GCTCAGCTGA 


TGAGTGAGAA 


132 0 


A T 1 A T>PP PPT"Ti 


bb 1 bbLAbAb 


A pr^/~« a ppmmrn 
AbrbbrACbi ill 


TCATAAGCCG 


TAATCATGTG 


ACCACGAATG 


13 80 


AbAbbbbbL A 


p a PP APfTT 1 ATI 

bAL LAb 1 LA 1 


A rprprnrn/^ a P A A 

A 1 1 1 1 LAbAA 


CCGATTGGGT 


TGCGTTTGTG 


AGGCATTGCT 


1440 


p A AP 7±C*C , r*rnrp 
bAAbALLL 1 1 


111 bLLL 111 


APPA A APA AP 

Abr b AAAbrAAb 


TCTTCTACTT 


CGCGTTGCTC 


AGATTTTTGT 


1500 


APAPPAPPA A 


mpmp a r^rr^r^ f~~* 
ill LAb 1 Lbb 


P A rp A p prpmpp 
LAI Ab.br 11 Lbi 


ATTGAAGTCG 


CAATGCTGGC 


AAGAACCGCA 


1560 


A Af^TAPTPAP 
.rt-rlb X X b Ab 


r^r^T^c^ a a pp mp 
bb 1 bAAbb 1 b 


APPAPPA APP 

AL L Abb AAbrbi 


AL 1 lbTGTTA 


AAGATTCCTT 


GGGCACGGAT 


162 0 


Ppp A Ap A rnrpm 
bbbAAbAl 1 1 


a mppp A P A P A 
A 1 LbLAbALA 


1 AL 1 L L 1 L 1 A 


p AAA mp pmp/^ 

LAAATGGTGG 


GATATTGGCA 


AAGTTCCCAA 


1680 


ppppappapa 


A A rpprnrp a pp a 


bib 1 1L1 AL AL 


PAPPAPPPPP 

LAbLAbLLbL 


a m/^»PmPP a a p 

ATGLTCGAAG 


CGCTCGATAT 


1740 


rnp pprprprpp A T 1 
Xbbbl X 


rprppp p mp rn a p 
1 ILbblblAL 


LAAb 1 1 bib 1 A 


A rprprp a a p a pp 

All lAAbALL 


AAAGGTTGTC 


GGCTCAGCGT 


1800 


PPAPAPPATP 


A P4 T a r 1 m r* H C C 


A TP A Tf2 A nnpp 

1 b- A 1 bfA 1 ub 


rpp A A prp rpp rpp 
1 b AAL 1 1 b 1 b 


pmppmmpppp 
L 1 LL 1 IbbLL 


mmp»mp a pp/^« "a 

TTGTCAGCGA 


186 0 


TCi A T A T 1 m A d T 

X VJ-rt X -C\ X X -rt-^JT X 


PA A Cl'W'WC A 
Vjr.ttrt.Vj X X X X Vwrt. 


A n prpp A HH A r 1 
Abb 1 LALbAL 


PP A rpp a mpmp 
bbA 1 bA 1 b 1 L 


pmmppppmpp 
b 1 i bbCCTGC 


TTGTAGAGGT 


192 0 


AACCATAAGC 


AGTATCCACC 


ACGTCGGTAG 


AAGTTAACCC 


ATAGTGAACC 


CACTTGCGCT 


1980 


CTTCACCAAG 


AGTCTCAGAA 


ACCGCACGCG 


TGAAAGCCAC 


CACATCGTGG 


CGCGTCTCCT 


2040 


GCTCAATTTC 


CAAAATACGG 


TCGATGTCAA 


AGTCCGCCTT 


CTTGCGAATC 


AAAGCCACAT 


2100 


CTTCCTTAGG 


GATTTCCCCC 


AACTCAGCCC 


ATGCCTCGTC 


AGAGAGGATT 


TCCACCTCAA 


2160 


GCCAAGCACG 


GTATTTATTT 


TCTTCACTCC 


AAATATTCGC 


CATCTCAGGG 


CGAGAGTAAC 


2220 


GGTTGATCAT 


GTGTTAATTT 


TTCCTTTCTT 


CTTAAGAT 






2258 



(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 43 92 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



CCCTTTTGCC 


TCTCCCTTTG 


GTGCAGATTC 


TTTTGGGAAT 


TGTGATTGGT 


CTCTTTTTAC 


60 


CCAATACTGA 


CTTTCATCTT 


AATACGGAGT 


TGTTTTTGGC 


CTGGTTATCG 


GACCCTTGCT 


120 


TTTCCGAGAG 


GCTGAAGAAG 


CAGATGTTAC 


GGCTATTTTA 


AAAC AC TGGC 


GAATCATTGT 


180 


TTATCTCATA 


TTTCCAGTGA 


TTTTTATCTC 


GACCCTGAGT 


TTGGGTGGCT 


TGGCCCATCT 


240 


TCTTTGGTTC 


AGCCTTCCCT 


TGGCAGCTTG 


CTTGGCTGTT 


GGGGCAGCCC 


TTGGTCCTAC 


300 


GGACTTGGTG 


GCCTTTGCCT 


CTCTTTCGGA 


GCGTTTTAGC 


TTTCCTAAGC 


GCGTGTCCAA 


360 


TATTCTTAAG 


GGCGAAGGAC 


TCTTGAATGA 


TGCTTCTGGT 


TTGGTGGCTT 


TTCAGGTAGC 


420 


TTTGACAGCT 


TGGACAACTG 


GAGCTTTTTC 


TCTGGGGCAA 


GCTAGCAGTT 


CGCTCATCTT 


480 


TTCAATCCTA 


GGCGGTTTTT 


TAATTGGATT 


TTTAACAGCC 


ATGACCAACC 


GCTTCCTCCA 


540 


TACCTTCTTG 


CTAAGTGTGC 


GCGCAACGGA 


TATTGCCAGT 


GAACTTTTAT 


TAGAATTCGA 


600 


GTTTGCCTCT 


AGTGACCTTC 


TTTCTGGCAG 


AAGAAGTCCA 


TGTTTCAGGG 


ATTATTGCCG 


660 


TCGTAGTTGA 


TCGAATTTTA 


AAGGCAAGTC 


GCTTCAAGAA 


AATCACGCTC 


CTCGAAGCCC 


720 


AAGTGGATAC 


GGTGACCGAG 


ACGGTCTGGC 


ATACAGTGAC 


CTTTATGCTC 


AACGGTTCTG 


780 


TCTTTGTGAT 


TTTAGGGATG 


GAGTTGGAAA 


TGATAGCAGA 


ACCTATCTTG 


ACCAATCCAA 


840 


TCTATAATCC 


TCTACTTTTA 


TTGCTATCTC 


TCATCGCCCT 


TACCTTTGTC 


CTCTTTGTCA 


900 


TTCGTTTTAT 


TATGATCTAT 


GGCTATTATG 


C C T AT AG AAC 


CCGACGCCTA 


AAGAAAAAGC 


960 


TAAATAAGTA 


m 7\ m/"1 *A 7\ / — 1 /"I TV /™1 

TATGAAGGAC 


ATGTTTCTCT 


TGACCTTTTC 


AGGTGTTAAG 


GGAACGGTGT 


1020 


CGATTGCTAC 


GATTCTCTTG 


AT AC C AAGT A 


ATCTAGAACA 


GGAGTATCCT 


CTCTTGCTTT 


1080 


1 LL I 1 bl TGC 


AGGTGTGACG 


CTTGTCAGCT 


TTTTAACAGG 


TCTCTTGGTC 


TTGCCTCATC 


1140 


1 1 lbl bA 1 bA 


AbAbbAAbAA 


AGCAAGGATT 


ATCTCATGCA 


TATCGCCATT 


TTGAATGAAG 


12 00 


TAACGCTAGA 


GTTGGAAAAA 


GAGTTGGAAG 


ACACCAGAAA 


TAAACTTCCC 


CTCTATGCGG 


1260 


CTATTGACAA 


TTCGATCATG 


GACGTATTGA 


AAATCTCATT 


TTAAGCCAAG 


AAAACCAGGA 


1320 


TGATCAAGAA 


GACTGGGCTG 


CTTTGAAAAT 


CGAATTCTTA 


GTATTGAAAG 


TGATGGTTTG 


1380 


GAACAGGCCT 


ATGAAGAGGG 


GAACATTAGC 


AATCGTGCTT 


ACCGAGTTTA 


CCAACGTTAT 


1440 


CTGAAAAATA 


TAGAACAAGG 


AATCAATCGT 


AAACTTGCCT 


CAAGACTGAC 


CTATTATTTT 


1500 


CTTGTTTCCT 


TGAGGATTTT 


ACGTTTTCTT 


CTTCATGAAG 


TTTTTACTCT 


TGGAAAGACC 


1560 


TTCCGTAGCT 


GGAAGGACAA 


GGAGCAAAGC 


CGTCTCCGTG 


CTCTTGATTA 


TGACCAAATT 


1620 


GCAGAGCTCT 


ATCTTGCCAA 


TACAGAGATG 


ATTATTGAAA 


GTTTGGAAAA 


CCTGAAGGGA 


1680 


GTCTACAGAC 


GCTCTTTGAT 


TAGTTTTATG 


CAGGAGTCTC 


GTCTTCGAGA 


AACAGCTATT 


1740 


ATCAGCAGTG 


GTGCCTTTGT 


CGAACGGGTT 


ATCAATCGTG 


TCAAACCCAA 


CAATATCGAT 


1800 


GAAATGCTGA 


GAGGCTATTA 


TCTGGAGCGC 


AAGTTGATTT 


TCGAATACGA 


AGAAAAACGA 


1860 


TTGATTACGA 


CTAAGTATGC 


CAAGAAATTA 


CGACAAAATG 


TAAATAACTT 


AGAGAACTAT 


1920 


TCCTTGAAGG 


AAGCTGCCAA 


TACCCTGCCG 


TATGATATGG 


TGGAATTGGT 


AAGAAGAAAT 


1980 


TAGTTAATAC 


TCTTCGAAAA 


TCTCTTCAAA 


CCACGTCAGC 


GTCGCCTTGG 


ATTATATATG 


2040 
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TGACTGACTT 


CGTCAGTTTC 


ATCTACAACC 


TCAAAGCAGG 


GCTTTGAGCA 


ACCTGCGGCT 


2100 


AGCTTCCTAG 


TTTGCTCTTT 


GATTTTCATT 


GAGTATAAGA 


TTGTAAGTGA 


AGGAGTGTGA 


2160 


CATGAAAAAA 


TGGGGAAAGA 


GCCTGAACTA 


GTCCTGTCTA 


CTTTTACCCA 


ATCACACTTC 


2220 


CATTTGGTAC 


AGCTGGATCA 


ACTGTGAGAA 


GGGATCGAAT 


TTGCCATCAT 


GTTCAGCTGA 


2280 


GAGAATCATA 


CCCTGGCTGA 


CATATTTTTT 


CATCATTTTA 


CGTGGTTTGA 


GGTTAGCAAC 


2340 


GATTTGAACT 


TTCTTGCCGA 


CCAATTCTTG 


TTCATTTGGA 


TAGTATTTTG 


CAATTCCTGA 


2400 


AAGAATCTGA 


CGATCTTCTC 


CATCACCAGC 


ATCCAAGCGG 


AATTGAAGCA 


ACTTATCTGA 


2460 


ACCTTCTACT 


TTAGACACTT 


CTTTGACTTC 


TGCGACACGG 


ATTTCAACCT 


TGTCAAAGTC 


2520 


TTCAAACTTG 


ATTTCATCCT 


TGTTTAGTTT 


GAGCTCAACT 


TCGTCCGGAT 


TCCATTCTTT 


2580 


TTCGACTGCT 


GGTTTATTGC 


CTTCCATTTG 


TTCCTTGATA 


TAGGCGATTT 


CTTCTTCCAT 


2640 


ATTTAGACGT 


GGAAAGATAG 


GTGTTCCTTT 


GGCAACTACA 


GTCACATCTG 


CTGGGAAGTC 


2700 


AGCCAAACTC 


AAGTTTTCAA 


G AC T AG AAAC 


TTCTTCCAAA 


CCAAGTTGAG 


TCAAAACTGC 


2760 


ACGACTAGTT 


TCCATCATAA 


ATGGTTCAAT 


CAAGTGAGCA 


AC T AC AC GAA 


TGCTGGCTGC 


2820 


CAAGTGGCTC 


ATGACACTTG 


CCAATTGGTC 


ACGAAGAGCT 


TCATCCTTGT 


CCAAGACCCA 


2880 


TGGTGCAGTC 


TCATCGATGT 


ATTTATTGGT 


AC GAG AG AT C 


AGAGTCCAGA 


CTGCTTCAAG 


2940 


CGCACGTGGA 


TAGTCAACTG 


CTTCCATGTG 


TGTATGGAAG 


TCTGCGATTG 


ATTTTTCTGC 


3000 


AACCTCAGCA 


AGAACATGAT 


CAAATTCAGT 


CACACCTTCT 


ACATAGGCAG 


GGATTTGTCC 


3060 


ATCAAAGTAC 


TTATTAATCA 


TGGAAACCGT 


ACGGTTAAGG 


AGGTTCCCAA 


GGTCATTAGC 


3120 


CAATTCATAG 


TTGATACGAC 


C G AC ATAGTC 


TTCAGGAGTA 


AAGGTTCCGT 


CTGAACCAAC 


3180 


TGGAAGGTTA 


CGCATGAGGT 


AGTAACGAAG 


TGGATCTAGT 


CCATAACGCT 


CTACCAACAT 


3240 


TTCAGGGTAA 


ACGACATTCC 


CTTTTGACTT 


AGACATTTTT 


CCGTCTTTCA 


TGACAAACCA 


3300 


ACCATGGGCA 


ATCAAACGAT 


CAGGTAATTT 


AACATCCAAC 


ATCATAAGAA 


GGATTGGCCA 


3360 


GTAGATAGAG 


TGGAAGCGAA 


GGATGTCTTT 


TCCTACCATA 


TGGAAGACTG 


TTCCATTCCA 


3420 


GAACTTGTCA 


AAGTTACCAT 


GTTCGTCTTG 


AGCGTAGCCA 


AAAGCTGTCG 


CATAGTTAAG 


3480 


AAGGGCATCA 


ATCCAAACGT 


AGACAACGTG 


TTTTGGATTT 


GATGGGACAG 


GCACTCCCCA 


3540 


TGTAAAGGTT 


GTACGAGATA 


CCGCCAAATC 


TTCCAAACCT 


GGCTCGATGA 


AGTTGCGTAG 


3600 


CATTTCATTA 


AGACGACCAT 


CTGGCGTGAT 


AAATTCAGGA 


TGAGCTTTGA 


AAAATTCGAC 


3660 


CAAACGGTCT 


TGGTATTTGC 


TAAGGCGAAG 


GAAGTATGAT 


TCTTCAGAAA 


CCCATTCAAC 


3720 


CTCATGACCT 


GATGGAGCAA 


TACCACCAGT 


CACATTTCCA 


GCTTCATCAC 


GGAAAACTTC 


3780 


TGCCAGCTGG 


CTTTCTGTAA 


AGAATTCTTC 


GTCTGATACT 


GAATACCAAC 


CAGAGTATTC 


3840 


ACCCAAGTAG 


ATATCATCTT 


GAGCAAGTAA 


GCGTTCAAAG 


ACCTGTGCGA 


CAACTTTTTC 


3900 


ATGGTAGTCA 


TCGGTTGTAC 


G G AT AAATTT 


ATCGTATGAG 


ATATCTAGTA 


ATTGCCAGAG 


3960 


TTCTTTAACT 


CCAACCGCCA 


TTCCATCAAC 


ATAGGCTTGA 


GGTGTAATAC 


CAGATTCGAA 


4020 


TTCCGCTTTC 


TGCTGGATTT 


TCTGACCATG 


TTCATCAAGA 


CCTGTCAGAT 


AAAATACATC 


4080 


GTAGCCCATC 


AGGCGTTTGT 


AACGTGCTAG 


GACATCACAT 


GCGATAGTTG 


TGTAGGCAGA 


4140 


ACCGATATGA 


AGTTTCCCAG 


ATGGATAGTA 


AATCGGCGTT 


GTAATATAAA 


AATTTTTTTC 


4200 


AGACATAATT 


TTTCCTTTCC 


AGGCAAATGA 


AACCTGTTTT 


TCTAACACTT 


CATTATATCA 


4260 


CATTTTTAAT 


GAATTTCGAT 


AGGGAAATCC 


ATACCAAAAC 


AAG AT AG AC G 


AGTGTCCATC 


43 2 0 


TTGTTGATCT 


CATTCATAAC 


GAAGGGCTTC 


AATTGGATCA 


AGTTTCGATG 


CCTTGTTGGC 


4380 


TGGCAAGACT 


CC 










4392 



(2) INFORMATION FOR SEQ ID NO: 51: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1941 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



AATTAGTATT 


CTCAACCTTT 


TTATCTTGAT 


AGTTCAAGAT 


GGCATTCGTT 


GAATTGGTAA 


60 


CATAGTAACT 


ATCCACTCCC 


TTCAGTTTAG 


CTGCCTCTTG 


AACCCAGGAT 


TCTTGCGGTT 


120 


TTGGCGGTTC 


AACAGGAATT 


CTTTTTCTTT 


TCCAGAAACC 


GTAAAAGCTG 


ATTGTTTCTG 


180 


AGTAAAAGAC 


CCATCTTTAC 


TTTTTTTAGG 


AGAGAAAAAG 


ACGCTAATAT 


TTTTCTGAGA 


240 


TTTAGTCATA 


TCTTTATTGA 


CTTGACGAGA 


TAGGGAATCA 


CCCAAAGCCA 


TAATCACAAC 


300 


AACTGATGAA 


AC AC C G AT AA 


TAATCCCAAT 


CATAGTAAGC 


AAAGAACGCA 


TCTTGTGAGC 


360 


CATGATAGAT 


GAAAAGGCAA 


ATTTCAGATT 


CTGCATCTTA 


GTTTTCCTCC 


TTTCCTAACT 


420 


GAGCACTGTC 


AGACGAAATG 


ACCCCATCCC 


GAATGACAAT 


CTGACGTTTG 


GCATAGGCAG 


480 


CAATCTCAGG 


CTTCATGCGT 


TACCATGATA 


ATGGTTTTTC 


CTTCTTTATT 


CAAATCAACC 


540 


AATAATTGCA 


TAATTTGGTT 


ACCTGTTTTG 


GTATCCAAGG 


CTCCTGTCGG 


TTCATCCGCT 


600 


AGGATAATAG 


AAGGATTGTT 


TACCAAGGCA 


CGCGCAATGG 


CTACACGTTG 


CTTTTGACCA 


660 


CCAGATAATT 


CTGAAGGTAA 


ATGGTGACTA 


CGTTCTATCA 


ATTCAACCTT 


GTCTAAATAT 


720 


TCCTCAGCCA 


ACTTGCGACG 


TTTTGAAGAC 


GAAACTCCTG 


CGTAAATCAA 


GGGCAATTCT 


780 


ACATTTTGCA 


GAGCATTGAG 


CTTCGATAGA 


AGAAAGAACT 


GCTGAAAGAC 


AAAACCGATT 


840 


TGTTGGTTAC 


GGACCTTAGC 


TAGTTGTTTT 


TCACCAAGCC 


CAGCCACTTC 


TTGACCTTCA 


900 


AGATAATATT 


CTCCACTGGT 


TGGTGTATCC 


AACATGCCAA 


TCGTATTCAT 


CAGAGTGGAC 


960 


TTAC C AG AC C 


CAGATGGTCC 


CATGATGGCT 


ACAAATTCAC 


CCTCATTCAC 


TTCTAGATTG 


1020 


ATATTTTTGA 


GAACCTGCAG 


TTCTTGGTCA 


CCATTACGGT 


AACTTCTGAA 


GATATTTTTT 


1080 


AGACTAATTA 


GTTGCTTCAT 


CAGCCTTCAC 


CTCTTTTCCT 


TCTTCCAAGG 


AAGATGTTGG 


1140 


ATTACTGATG 


ACCTTAGCAC 


CGTTCGTTAA 


ACCAGAAGTG 


ATTTCTTGAT 


TTTCTGCGTC 


1200 


AGCATTTCCC 


AATGAAACCT 


CAACTTTTTT 


AGCCTTTTGT 


TGTTCATCCA 


CAATCCAGAC 


1260 


ATAATTTTTA 


CTATCATCCA 


TTACTAGACT 


GCTAACAGGA 


ACAAGAATAG 


CCTTAGTTTT 


1320 


GCTTTTAACC 


TCAATGTTGA 


CAGAAAAACC 


TTGTTTCAAA 


TCACCAACCT 


CGCCTGTCAC 


1380 


ATCAATAGTA 


TAAGGGTATT 


TAG AAC C TGT 


ATTATTCCCG 


GCTGCTGGAC 


TAGCTGCTTC 


1440 


ACCATTGTTT 


TTAGGATAGT 


CAGAAATATA 


GGCTTAATTT 


CCCAGTCCAT 


TTTTTATCAG 


1500 


GATACACTTT 


AGAAGTAAAG 


CTTACTTCTT 


G AC C T AC AG A 


AAGGTTGGCT 


AG ATTG T AC T 


1560 


CAGACAATTC 


TCCCTTGACT 


TGTAAATTTT 


CATTGCTGAC 


AATATGAACC 


ATAACTTGAC 


1620 


TCGCCCCTGT 


TGGAGATTTA 


GAAACATTGC 


TATTGACTTC 


GACTACAGTT 


CCCTCTAGGG 


1680 


TACTGAGAAC 


AGTTGTTGCA 


TCCAATTGAC 


TTTGAGCCTT 


GCTTAATTGC 


GCTGCAGCAT 


1740 


CTGCACGCGC 


ATCACGGGCA 


TCACCCAATT 


GAGCATCAAT 


AGAAGCAACA 


GAATTTCCAG 


1800 


CCACTGGAGT 


TGGGCTTTGC 


ACCGTTGCAT 


CTTCTCCTCC 


TACTGGCGCT 


GGTAAC TGTG 


1860 


GAGCCTGAGC 


TGAAGCGGCT 


TCATTTCGTG 


CTTGATTGAG 


TTCATTGATA 


TGACGATCTG 


1920 


CCTTAGCTAC 


TGCTCGACTA 


G 








1941 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 



ATCGAATTCC 


CTATTTTAAC 


ACTTTCTTTT 


CTAAAACAGT 


CTATATTTTA 


TTTCAAACTG 


60 


TATTATATTT 


TTGAAAAAAT 


AAAGTCCTTT 


TTTCTTTTTT 


TCAGAAAAAA 


GGGTATAATA 


120 


AAAGAAAATA 


AGCAGTAACA 


CTCAATGGAA 


ATCGAAAAAG 


CAAACTAGGA 


AGCTAGCCGC 


180 


AGATTGCTCA 


AAACACTGTT 


TTGAGGTTGC 


AGATAGAGCT 


GACGTGGTTT 


GAAGAGATTT 


240 


TCGAAGAGTA 


TAAAAAGGTG 


CTAGGCATGT 


TGATTTTTCC 


TTTGTTAAAT 


GATTTGTCAA 


300 


GAAAAATCAT 


C C ATATTGG A 


CATGGATGCC 


TTTTTTGCTG 


CAGTGGAAAT 


CAGGGATAAT 


360 


CCTAAACTCA 


GAGGAAAACC 


TGTCATTATT 


GGAAGCGACC 


CTCGGCAAAC 


AGGTGGACGG 


420 


GGAGTCGTTT 


CTACCTGTAG 


TTATGAGGCA 


AGAGCTTTTG 


GTGTCCATTC 


TGCCATGAGT 


480 


TCCAAGGAAG 


CTTATGAACG 


TTGTCCCCAG 


GCTGTCTTTA 


TCTCAGGGAA 


TTCGATGAGA 


540 


AATACAAGTC 


TGTGGGACTC 


CAGATTCGAG 


CTATTTTTAA 


GCGCTATACA 


GATTTGATTG 


600 


AACCCATGAG 


CATTGACGAA 


GCCTATTTGG 


ATGTGACAGA 


AAATAAACTC 


GGTATCAAGT 


660 


CAGCGGTCAA 


AATTGCTCGC 


CTCATTCAAA 


AAGATATCTG 


GCAAGAACTC 


CATCTAACTG 


720 


CTTCCGCAGG 


CGTTTCTTAC 


AACAAATTCT 


TAGCTAAAAT 


GGCGAGTGAT 


TATCAAAAAC 


780 


CACATGGTTT 


GACAGTGATT 


CTACCTGAAC 


AGGCTGAGGA 


TTTTCTCAAA 


CAAATGGATA 


840 


TTTCCAAATT 


TCATGGAGTA 


GGAAAAAAGA 


CAGTAGAACG 


TCTTCATCAA 


ATGGGCGTTT 


900 


TTACTGGTGC 


TGATTTACTT 


GAAGTTCCTG 


AGGTAACCCT 


AATAGACCGT 


TTTGGTAGAC 


960 


TAGGCTATGA 


TCTGTATCGA 


AAGGCTCGTG 


GCATTCACAA 


CTCTCCAGTC 


AAATCCAATC 


1020 


ACATCCGTAA 


ATCAATCGGC 


AAGGAGAAAA 


CCTACGGGAA 


GATTCTCCGT 


GCTGAGGAAG 


1080 


ATATCAAAAA 


AGAGAGCTGA 


CTCTTCTATC 


AGAAAAAGTC 


GCTCTCAATC 


TACATCAACA 


1140 


AGAAAAAGCT 


GGAAAAATTG 


TCATTTTGAA 


AATCCGCTAC 


GAGGACTTTT 


CAACTCTTAC 


1200 


CAAACGAAAA 


AGTATTGCTC 


AAAAAACACA 


AGATGCTAGT 


CAGATAAGCC 


AAATAGC C C T 


1260 


GCAACTCTAT 


GAAGAATTAA 


GTGAGAAAGA 


AAGAGGTGTC 


CGCCTATTGG 


GG ATTAC CAT 


1320 


GACTGGATTT 


TAAAG 










1335 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1796 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 



TCCAAGCTAG 


CTATTTCGTG 


GAAGGGGCTT 


CGGTTGGCAG 


AACCTGGTGA 


ATTTACCCAA 


60 


ACGTGCTTTT 


TTAAACGGTC 


GCGTAGACTT 


GACACAGGCA 


GAGGCTGTGA 


TGGATATCAT 


120 


CCGTGCCAAG 


ACTGACAAGG 


CCATGAACAT 


TGCGGTCAAA 


CAATTAGACG 


GCTCCCTTTC 


180 


TGACCTCATT 


AACAATACCC 


GTCAAGAAAT 


CCTCAATACA 


CTTGCCCAAG 


TTGAGGTCAA 


240 


TATCGACTAT 


CCTGAATATG 


ATGATGTTGA 


GGAAGCTACT 


ACTGCCGTTG 


TCCGTGAGAA 


300 


GACTATGGAG 


TTTGAGCAAT 


TGCTAACCAA 


GCTCCTTAGG 


ACAGCACGTC 


GTGGTAAAAT 


360 


CCTTCGTGAA 


GGAATTTCAA 


CGGCTATCAT 


TGGACGTCCC 


AACGTTGGGA 


AATCAAGCCT 


420 


TCTCAACAAC 


CTCTTGCGTG 


AGGACAAGGC 


TATCGTAACC 


GATATCGCTG 


GGACAACACG 


480 


AGATGTCATC 


GAAGAGTACG 


TCAACATCAA 


TGGTGTTCCT 


CTAAAATTGA 


TTGACACAGC 


540 


TGGTATTCGT 


GAAACGGATG 


ATATCGTTGA 


ACAAATCGGT 


GTTGAGCGTT 


CGAAAAAAGC 


600 


CCTCAAGGAA 


GCCGACTTGG 


TTCTACTAGT 


GCTAAATGCC 


AGTGAACCAC 


TGACTGCGCA 


660 


AGACAGACAA 


CTTCTTGAAA 


TTAGCCAAGA 


TACCAATCGC 


ATTATTCTAC 


TTAATAAAAC 


720 


CGACCTGCCA 


GAAACGATTG 


AAACTTCGAA 


ACTACCTGAA 


GACGTTATCC 


GTATTTCAGT 


780 


CCTTAAAAAC 


CAAAACATCG 


ACAAGATTGA 


AGAGCGAATC 


AACAACCTCT 


TCTTTGAAAA 


840 


TGCTGGCTTG 


GTCGAGCAAG 


ATGCTACTTA 


CTTGTCAAAC 


GCCCGTCACA 


TTTCCCTGAT 


900 


TGAAAAAGCA 


GTTGAAAGCC 


TACAAGCCGT 


TAATCAAGGT 


CTTGAGCTGG 


GGATGCCAGT 


960 


TGATTTGCTT 


CAAGTTGACT 


TGACTCGTAC 


TTGGGAAATC 


CTCGGAGAAA 


TCACTGGGGA 


1020 


TGCTGCTCCA 


GATGAACTCA 


TCACCCAACT 


CTTTAGCCAA 


TTCTGTTTAG 


GAAAATAAGA 


1080 


AAAATCCATG 


ATCCTTCATT 


CGGTCATGGA 


TTTTATTGTC 


TTTATTAGTA 


ATCTGGTCTT 


1140 


AAGACCCCTG 


TTACAGTTGC 


CTTAGTTGCT 


TCGTAGTCGC 


CATCTACGAC 


AACCTTGATA 


1200 


ATGCGTTTGA 


CATCTTCTTC 


TGGTGCTGGA 


ACAAGAGGTA 


GACGAGTGGG 


TCCAGCTTCA 


1260 


AATCCCATAT 


AGTTAAGAAT 


TGCCTTAACT 


GGAGCAGGAC 


TTGGATAAGA 


GAAGAGAGCA 


1320 


TTAACCTTAG 


GAATGAATTT 


ACGCTGAATT 


GCTGCGGCTT 


TCTTCATATC 


GCTTTCTGCA 


1380 


ATGGCAGTAA 


ACATCTCGTG 


CATTTCATCC 


CCATTTGTAT 


GAGAGGCAAC 


AGAAATAACC 


1440 


CCATCCGCCC 


CAAGGTTCAT 


GGCATGGAAA 


GCATCTCCAT 


CCTCACCTGT 


ATAAATCAAG 


1500 


AACTCTTCAG 


GCTTGTGCTC 


AATCAAGTAA 


GC CAT ATT AG 


CCAAGCTAGT 


ACATTCTTTG 


1560 


ACACCGATAA 


TATTTGGATG 


GTCAGCCAAG 


CGAAGCATGG 


TTTCTGGAGT 


CAATTCGACA 


1620 


ACTACACGCC 


CTGGAATGTT 


ATAGATAATA 


ATTGGTAGGT 


CAGAAGCATC 


TGCAATAGCC 


1680 


TTAAAGTGCT 


GATACATCCC 


TTCTTGAGAA 


GGTTTGTTGT 


AGTAAGGAAC 


AATAGCAAGC 


1740 


CCAGCTGCGA 


AACCACCAAA 


TTCCGCTACT 


TCTTTGACAA 


ACTCAATAGA 


GTCACG 


1796 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2337 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 



CTTCGTACAG 


GTGGTTCCTA 


TGCAAGGGTG 


GAAGCCAATC 


GTCAGAACAA 


CAAGCATCTT 


60 


CATCAAGCCA 


GAACTGGAGC 


AATTACAAAA 


AG AAATTGC T 


GAAGAAGAAG 


CAAGCTTGGG 


120 


TTCAGAAGAA 


GTGGCTTTGA 


AGACCTTGCA 


AGATGAGATG 


GCCAGATTGA 


CCGAGTCATT 


180 


AGAAGCTATT 


AAATCTCAAG 


GAGAGCAGGC 


ACGTATTCAG 


GAGCAAGGCT 


TGTCCCTCGC 


240 


TTATCAGCAA 


ACTAGTCAGC 


AAGTTGAAGA 


ACTGGAAACT 


CTTTGGAAAC 


TCCAAGAAGA 


300 


GGAAATAGAT 


CGTCTTTCCG 


AGGGAGATTG 


GCAAGCGGAT 


AAGGAAAAAT 


GCCAAGAGCG 


360 


TCTTGCTGCA 


ATCGCCAGTG 


ACAAGCAAAA 


TCTGGAAGCT 


GAGATTGAAG 


AGATTAAGTC 


420 


TAATAAAAAT 


GCCATCCAAG 


AACGCTATCA 


AAACTTGCAG 


GAAGAGCTAG 


CGCAAGCTCG 


480 


TTTGCTTAAG 


ACAGAACTGC 


AAGGGCAAAA 


ACGTTATGAA 


ATTGCTGATA 


TTGAACGCTT 


540 


AGGCAAGGAA 


TTGGACAATC 


TTGATTTTGA 


ACAAGAGGAA 


ATCCAGCGCC 


TTCTTCAAGA 


600 


AAAGGTTGAC 


AATCTTGAGA 


AGGTTGATAC 


AGAATTGCTC 


AGTCAACAGG 


CGGAAGAATC 


660 


CAAAACTCAG 


AAAACGAACC 


TCCAACAAGG 


TTTGATTCGC 


AAACAGTTTG 


AGTTGGATGA 


720 


T AT AG AAGG T 


CAGCTGGATG 


ATATTGCTAG 


TCATTTGGAT 


CAGGCTCGCC 


AGCAGAATGA 


780 


GGAGTGGATT 


CGCAAGCAAA 


CACGTGCTGA 


AGCTAAGAAA 


GAAAAGGTCA 


GCGAGCGCTT 


840 


TGCCGCCATC 


TACAAAGTCA 


ATTAACAGAC 


CAGTACCAGA 


TTAGCCATAC 


TGAAGCTCTA 


900 


GAAAAAGCGC 


ATGAATTGGA 


AAACCTCAAT 


CTGGCAGAGC 


AAGAAGTTAA 


GGATTTAGAG 


960 


AAGGCTATTC 


GCTCACTGGG 


TCCTGTCAAT 


ATAGAAGCTA 


TTGACCGGTA 


CGAAGAAGTT 


1020 


CACAACCGTC 


TGGACTTTCT 


AAATAGTCAG 


CGAGATGATA 


TTTTGTCAGC 


GAAAAATCTG 


1080 


CTCCTTGAAA 


C C ATT AC AAA 


GATGAATGAT 


GAGGTTAAGG 


AACGCTTTAA 


ATCAACCTTT 


1140 


GAAGCTATTC 


GTGAGTCCTT 


TAAAGTGACC 


TTCAAGCAGA 


TGTTTGGCGG 


AGGTCAGGCA 


1200 


GACTTGATAT 


TGACTGAGGG 


CGACCTTTTA 


CAGCTGGTGT 


GGAGATTTCT 


GTTCAACCTC 


1260 


CAGGTAAGAA 


AATCCAGTCG 


CTTAACCTCA 


TGAGTGGTGG 


TGAAAAAGCC 


CTATCGGCTC 


1320 


TTGCCTTGCT 


TTTCTCCATT 


ATTCGTGTCA 


AG AC C ATTC C 


TTTTGTCATC 


TTGGATGAGG 


1380 


TGGAAGCTGC 


GTTGGATGAA 


GCCAATGTTA 


AACGTTTTGG 


GGATTACCTC 


AACCGCTTTG 


1440 


ACAAGGACAG 


CCAGTTTATC 


GTCGTAACCC 


ACCGTAAGGG 


AACCATGGCA 


GCGGCCGATT 


1500 


CCATCTATGG 


AGTGACCATG 


CAAGAATCGG 


GTGTTTCAAA 


GATTGTTTCA 


GTTAAGTTAA 


1560 


AAGATTTAGA 


AAGTATTGAA 


GGATGACAAT 


TAAACTAGTA 


GC AAC GG ATA 


TGGACGGAAC 


1620 


CTTCCTAGAT 


GAGAATGGGC 


GCTTTGATAT 


GGACCGCCTC 


AAGTCTCTCT 


TGGTTTCCTA 


1680 


CAAGGAAAAA 


GGGATTTACT 


TTGCGGTGGC 


TTCGGGTCGG 


GGATTTCTGT 


CTCTGGAAAT 


1740 


CGAATTATTT 


GCTGGTGTTC 


GTGATGACAT 


TATTTTCATC 


GCGGAAAATG 


GCAGTTTGGT 


1800 


AGAGTATCAA 


GGTCAGGACT 


TGTATGAAGC 


GACTATGTCT 


CGTGACTTTT 


ATCTGGCAAC 


1860 


TTTTGAAAAG 


CTGAAAACGT 


CACCTTATAT 


AGATATCAAT 


AAACTGCTCT 


TGACGGGTAA 


1920 


GAAGGGTTCA 


TATGTTCTAG 


ATACGGTTGA 


TG AG AC C TAT 


TTGAAAGTGA 


GTCAGCATTA 


1980 


TAATGAAAAT 


ATCCAAAAAG 


TAGCGAGTTT 


GGAAGATATC 


ACAGATGACA 


TTTTCAAATT 


2040 


TACAACCAAC 


TTCACAGAAG 


AAACGCTAGA 


AGCTGGTGAA 


GCTTGGGTCA 


ATGATAATGT 


2100 


CCCTGGTGTC 


AAGGCTATGA 


CAACTGGCTT 


TGAATCTATT 


GATATTGTTC 


TGGACTATGT 


2160 


CGATAAGGGT 


GTAGCTATTG 


TTGAATTAGC 


TAAAAAACTT 


GGCATCACAA 


TGGATCAGGT 


2220 


CATGGCTTTT 


GGAGACAATC 


TTAATGACTT 


ACATATGATG 


CAGGTTGTGG 


GACATCCTGT 


2280 


AGCTCCTGAA 


AATGCACGAC 


CAGAGATTTT 


AGAATTAGCA 


TAAGACTGTG 


ATTGGTC 


2337 



(2) INFORMATION FOR SEQ ID NO: 55: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2162 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 



C T AAAAGTG A 


AGCCCGATAG 


CGTCTCTCTC 


CTGCAAGGAT 


TTCATAACCA 


ATAACAGGAG 


60 


ATTGACGAAC 


AATAATCGGT 


TGAATGACCC 


CATTTTCTTT 


GAT AG AC TGT 


GCTAGTTCAT 


120 


CTAGCTTTTC 


TCTATCAAAT 


TCTTTTCGGG 


GTTGATAGGG 


ATTTTTTTGT 


ATATCTGTGA 


180 


TAGAAATCAT 


TTCAAATTTT 


TCCATGATTC 


TACACTAACA 


CATCTTTTCT 


CTTATGTAAA 


240 


GCTTTCTTTA 


CATAGATGTC 


AATTAAGATT 


CTAAATCACC 


TGAACTCTTG 


TTAAGTTTGA 


300 


TAGAGGTAGT 


TTCTTCTTTC 


CCGTTACGAT 


AGTAGGTTAT 


CTTAATGGTG 


TCTCCGATAG 


360 


AATGGTTGTA 


AAGAGCACTT 


TGTAAGTCTG 


TTGATGAAGC 


AATCTCTTTG 


TCATCTACTT 


420 


TTGTAATTAC 


ATCGTATTTT 


TCAAGGTGAC 


CATTGGCAGG 


CATATTACTT 


TGTACCGAAC 


480 


GAACAATTAC 


ACCAGATGTA 


ACATTACTTG 


GAATATTGAG 


TCTTCTGATG 


TCGCTTGTAC 


540 


TCACATTAGA 


TAAATTAACC 


ATCTGGATTC 


CCAAAGCTGG 


ACGCGTCACT 


TTTCCGTTTT 


600 


TTTCTAACTG 


TTCAATAATA 


TTGATAGCAT 


CATTTGCAGG 


AATTGCGAAA 


CCAAGACCTT 


660 


CTACAGATGT 


TCCTCCATTT 


GTAGCAATTT 


TACTTGAGGT 


AATTCCGATA 


ACCTGCCCTT 


720 


GAATATTGAT 


CAGTGGGCCG 


CCAGAGTTAC 


CTGGGTTAAT 


AGCAGTATCA 


GTTTGGATGG 


780 


CTTTTGTAGA 


AATAGCTTGT 


CCATCTTCCG 


ATTTTAAGGA 


TACATTTCTA 


TTGAGACTGG 


840 


AT AC GAT AC C 


TTGAGTGACA 


GTATTTGCAT 


ATTCAGAACC 


TAACGGGCTA 


CCGATGGCAA 


900 


TAGCAGTTTC 


TCCTACAGTT 


AACTTACTAG 


AATCACCAAA 


CTCAGCTACT 


GTTGTCACTT 


960 


TTTCTGAAGA 


GATTTCGACG 


ACAGCAATAT 


CAGAGAAAGT 


GTCAGCTCCG 


ACAATTTCTC 


1020 


CAGGTACTTT 


AGTCCCATCT 


GACAATCGAA 


TATCTACTTT 


GCTGGCGCCA 


TTTATAACGT 


1080 


GATTGTTGGT 


GACGATGTAA 


GCTTCTTTAT 


CATTCTTTTT 


ATAAATAACT 


CCAGATCCTT 


1140 


CACTAGAGAT 


TCGCTGAGAA 


TC TGTGTC AG 


TATCATCATT 


GCCAAATACG 


CTATTTTGTC 


1200 


TGTTTGCCGA 


ATAAGTAATA 


ACAGAAACAA 


CAGCATCTTT 


TACTTTGTTA 


ACGGCCTGTG 


1260 


TTGTTGAATT 


TTCCGTTCCT 


TATAGGCAGT 


TTGTGTAATA 


GTACTATTGT 


TGTTAGAGTT 


1320 


GTTTACACTA 


CTTTTTTGAG 


TTAGTTGAGT 


TATTGAAAAA 


CTACCCAAGG 


CTCCACTAAA 


1380 


AAAGCTAATG 


ACGATAACGA 


CTAATAATTG 


AAACCATTTT 


TTGTAAAATG 


TTTTTAGATG 


1440 


TTTCATATTT 


GCCTCCATAT 


GTTTGAATTA 


C TGAAAGT AT 


AAACTGACTA 


GCTTAATTAT 


1500 


AACTTAAACA 


CAAAAGTTTT 


ACACAAACTG 


TGGATAACTC 


TTTTGAAACT 


GTGATTTTCT 


1560 


TAATTGAAAT 


CTATTTTTTA 


TTTTGTGAAT 


AAGATGTGAA 


AAAATAGAGA 


ATATGTTAGA 


1620 


ATAGAGTCAT 


GAAAATTAAA 


GTTGTAACAG 


TTGGGAAACT 


GAAAGAAAAG 


TATTTAAAAG 


1680 


ATGGTATCGC 


AGAGTATTCA 


AAACGAATTT 


CTAGATTTGC 


TAAGTTTGAA 


ATGATTGAGT 


1740 


TATCAGATGA 


AAAAACACCA 


GATAAGGCCA 


GTGAATCAGA 


AAATCAAAAG 


ATTTTAGAAA 


1800 


TAGAAGGTCA 


GAGAATTTTA 


TCAAAAATTG 


CTGACCGTGA 


TTTCGTTATT 


GTGTTAGCCA 


1860 


TTGAAGGGAA 


AACTTTCTTC 


TCAGAAGAAT 


TTAGTAAGCA 


GTGAGAAGAA 


ACTTCTATAA 


1920 


GGAAGGATGT 


CTACTCTTAC 


TTTTATTATT 


GGGGGAAGTT 


TAGGATTGTC 


ATCATCTGTA 


1980 
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AAAAATAGAG CCAATCTTTC TGTCAGTTTT GGTCGCCTAA CCTTGCCTCA TCAGTTAATG 2 040 

AGACTAGTTC TTGTTGAACA AATCTATCGC GCTTTTACGA TTCAGCAGGG ATTCCCCTAC 2100 

CATAAATAGA GAATTGACTT TTAATTGAAT TTTTGGTAGA ATAATTGTGT TAGGTCTCAT 2160 

AG 2162 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1766 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 



ATCGAATTTT 


CCAAAATGGG 


GAGCTAGAGC 


AGTGGAGTGA 


TTATGTGGCA 


GACGATTTGA 


60 


TTCAGCATAA 


TCATGAGATT 


GGACAAGGAA 


GTGCTGCTTA 


T AAAAAC TAT 


GTGGCTGAAT 


120 


ATATTGTCAC 


TTTTGACTTC 


GTTTTCCAAC 


TCTTAGGACA 


AGGAAACTAT 


GTGGTTAGCT 


180 


ATGGTCAGAC 


TCAGATTGAT 


GGCGTTGCTT 


ATGCCAAGTA 


CGATATCTTC 


CGTTTAAAGA 


240 


AC GGG AAAAT 


TGTGGAGCAT 


TGGGATAATA 


AGGAAGTCAT 


GCCTAAGGTA 


GAAGACTTGA 


300 


CCAATCGAGG 


GAAGTTTTAA 


ATTGAGGACA 


AAGAATGATT 


GAATACAAAA 


ATGTAGCACT 


360 


GCGCTACACA 


GAAAAGGATG 


TCTTGAGAGA 


TGTCAACTTA 


CAGATTGAGG 


ATGGGGAATT 


420 


TATGGTTTTA 


GTAGGGCCTT 


CTGGGTCAGG 


TAAGACGACC 


ATGCTCAAGA 


TGATTAACCG 


480 


TCTTTTGGAA 


CCAACTGATG 


GAAATATTTA 


TATGGATGGG 


AAGCGCATCA 


AAGACTATGA 


540 


TGAGCGTGAA 


CTTCGTCTTT 


CTACTGGTTA 


TGTTTTACAG 


GCTATTGCTC 


TTTTTCCAAA 


600 


TCTAACAGTT 


GCGGAAAATA 


TTGCTCTCAT 


TCCTGAAATG 


AAGGGGTGGA 


GCAAGGAAGA 


660 


AATTACGAAG 


AAAACAGAAG 


AGCTTTTGGC 


TAAGGTTGGT 


TTACCAGTAG 


CCGAGTATGG 


720 


GCATCGCTTA 


CCTAGTGAAT 


TATCTGGTGG 


AGAACAGCAA 


CGGGTCGGTA 


TTGTCCGAGC 


780 


TATGATTGGT 


CAGCCCAAGA 


TTTTCCTCAT 


GGATGAACCC 


TTTTCGGCCT 


TGGATGCTAT 


840 


TTCGAGAAAA 


CAGTTGCAGG 


TTCTGACAAA 


AGAATTGCAT 


AAAGAGTTTG 


GGATGACAAC 


900 


GATTTTTGTA 


ACCCATGATA 


CGGATGAAGC 


CTTGAAGTTG 


GCGGACCGTA 


TTGCTGTCTT 


960 


GCAGGATGGA 


GAAATTCGCC 


AGGTAGCGAA 


TCCCGAGACA 


AT TTT AAAAG 


TGCCTGCAAC 


1020 


AGACTTTGTA 


GCAGACTTGT 


TTGGAGGTAG 


TGTTCATGAC 


TAATTTAATT 


GCAACTTTTC 


1080 


AGGATCGTTT 


TAGTGATTGG 


TTGACAGCTA 


CAATGACATT 


GGTCGGTTCC 


TTGAGCAAGA 


1140 


GATAGATTAG 


CCAGACAGTC 


ATGCCCAAAA 


TCCCTCCAGG 


TAAGAGCATA 


GACCGTTGCA 


1200 


CATTAAGTAC 


GATTAAAAAA 


GTGATAATGG 


CAAGAAAACT 


TGCTACTGCT 


TGTAATAAAA 


1260 


AGGTTGTTAG 


TGTCATATTA 


GTTCATCAAT 


ACCAAGGCGA 


CAGAAGTTCC 


TGCCCCTAAA 


1320 


GCGAGGGTAA 


TGAGCAGGGA 


TTCAAACATC 


TTACTCATAC 


CAGAGTTTAT 


GTGGTTGGTC 


1380 


ATAATATCAC 


GGACCGCATT 


GGTCAAGGCA 


ATACCTGGTA 


CAAACGGCAT 


GACCGCACCA 


1440 


GCTATAATCA 


AATCTGCCGT 


TGAAGGAAAA 


CCTGTGTAGC 


GAGCCCAAAA 


CTGGGCAATT 


1500 


ATCCCAAAGA 


CAAAAGCTCC 


AGCAAAGGCT 


GTCACAAAGG 


GAATTCGGAT 


AAATTTTTCC 


1560 


ACATAGAGGG 


AAAAGGCAAA 


ACCAAATAAG 


GTCGCCACTC 


CTGCCCCAAG 


TGCGTCGTAG 


1620 
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ATATTTCCGC TAAACATAAC TGAAAAGAAA GGAGCACTAA AGGTCGCAGC CAGAGTTACC 1680 
TGCAACTTAG TATAGGGAAG GGGTTGAGCT TGCAAGGCCG TCAATTGCTT AAAGGCTGTT 174 0 
TCTAAGTCAA TCTGCCCCCC AACTGG 17 6 6 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1705 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 



CTCTGACGGA 


GGCTGGTTAT 


GTGGGTGAGG 


ATGTGGAAAA 


TATACTCCTC 


AAACTCTTGC 


60 


AGGTTGCTGA 


CTTTAACATC 


GAACGTGCAG 


AGCGTGGCAT 


TATCTATGTG 


GATGAAATTG 


120 


ACAAGATTGC 


CAAGAAGAGT 


GAGAATGTGT 


CTATCACACG 


TGATGTTTCT 


GGTGAAGGGG 


180 


TGCAACAAGC 


CCTTCTCAAG 


ATTATTGAGG 


GAACTGTTGC 


TAGCGTACCG 


CCTCAAGGTG 


240 


GACGCAAACA 


TCCACAACAA 


GAGATGATTC 


AAGTGGATAC 


AAAAAATATC 


CTCTTCATCG 


300 


TGGGTGGTGC 


TTTTGATGGT 


ATTGAAGAAA 


TTGTCAAACA 


ACGTCTGGGT 


GAAAAAGTCA 


360 


TCGGATTTGG 


TCAAAACAAT 


AAGGCGATTG 


ACGAAAACAG 


CTCATACATG 


CAAGAAATCA 


420 


TCGCTGAAGA 


CATTCAAAAA 


TTTGGTATTA 


TCCCTGAGTT 


GATTGGACGC 


TTGCCTGTTT 


480 


TTGCGGCTCT 


TGAGCAATTG 


ACCGTTGATG 


ACTTGGTTCG 


CATCTTGAAA 


GAGCCAAGAA 


540 


ATGCCTTGGT 


GAAACAATAC 


CAAACCTTGC 


TTTCTTATGA 


TGATGTTGAG 


TTGGAATTTG 


600 


ACGACGAAGC 


CCTTCAAGAG 


ATTGCTAATA 


AAGCAATCGA 


ACGGAAGACA 


GGGGCGCGTG 


660 


GACTTCGCTC 


CATCATCGAA 


GAAACCATGC 


TAGATGTTAT 


GTTTGAGGTG 


CCGAGTCAGG 


720 


AAAATGTGAA 


ATTGGTTCGC 


ATC AC TAAAG 


AAACTGTCGA 


TGGAACGGAT 


AAAC CG ATC C 


780 


T AG AAAC AG C 


CTAGAGGTGA 


CTATGGAACT 


TAATACACAC 


AATGCTGAAA 


TCTTGCTCAG 


840 


TGCAGCTAAT 


AAGTCCCACT 


ATCCGCAGGA 


TGAACTGCCA 


GAGATTGCCC 


TAGCAGGGCG 


900 


TTCAAATGTT 


GGTAAATCCA 


GCTTTATCAA 


CACTATGTTG 


AACCGTAAGA 


ATCTCGCTCG 


960 


TACATCAGGA 


AAACCTGGTA 


AAACCCAGCT 


CCTGAACTTT 


TTTAACATTG 


ATGACAAGAT 


1020 


GCGCTTTGTG 


GATGTGCCTG 


GTTATGGCTA 


TGCTCGTGTT 


TCTAAAAAGG 


AACGTGAAAA 


1080 


GTGGGGGTGC 


ATGATTGAGG 


AGTAATTTAA 


CGACTCGGGA 


AAATCTCCGT 


GCGGTTGTCA 


1140 


GTCTAGTTGA 


CCTTCGTCAT 


GACCCGTCAG 


CAGATGATGT 


GCAGATGTAC 


GAATTTCTCA 


1200 


AGTATTATGA 


GATTCCAGTC 


ATCATTGTGG 


CGACCAAGGC 


GGACAAGATT 


CCTCGTGGTA 


1260 


AATGGAACAA 


GCATGAATCA 


GCAATCAAAA 


AGAAATTAAA 


CTTTGACCCA 


AGTGACGATT 


1320 


TCATCCTCTT 


TTCATCTGTC 


AGCAAGGCAG 


GGATGGATGA 


GGCTTGGGAT 


GCAATCTTAG 


13S0 


AAAAATTGTG 


AGGAAAAGAA 


AATGGCAAAA 


ACAATTCATA 


CAGATAAGGC 


CCCAAAGGCT 


1440 


ATCGGGCCCT 


ATGTTCAAGG 


AAAAATCGTT 


GGCAACCTTT 


TGTTTGCTAG 


CGGTCAAGTT 


1500 


CCCCTATCCC 


CTGAAACTGG 


GGAAATTGTA 


GGAGAGAATA 


TCCAAGAACA 


GACAGAGCAA 


1560 


GTCTTGAAAA 


ACATCGGTGC 


TATTTTGGCA 


GAAGCAGGAA 


CAGACTTTGA 


CCATGTTGTC 


1620 


AAAACAACTT 


GTTTCTTGAG 


CGATATGAAC 


GACTTTGTTC 


CTTTTAATGA 


GGTTTACCAA 


1680 
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ACGGCCTTCA AAGAGGAATT CCCAG 17 05 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1673 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 



ACGTTTTGGG 


AACTGTTCGG 


ATAGCAGATT 


CCGAACAAAC 


TGATAATGGT 


TGGCAAAATC 


60 


ATTATTCCTA 


ATAGTAACGA 


AGCTGGTTAG 


GACAACTCAT 


GCCATTTCCT 


AAAAAGGTTT 


120 


TAATCCAAGG 


CACCAATAAT 


TGTAGGCCGA 


AAAAAC CAT A 


AACAATAGAT 


GGAATGGCTG 


180 


CCATCAAGTT 


GATAGCTGAT 


TTTAAGAAGC 


TATAGACGGG 


CTTTGGACAA 


TTATAAACCA 


240 


TAAACACCGA 


TGTCAAGATC 


GCCTGTTGGC 


ACCCCAATCA 


CAATCGCTCC 


TAAGGTCGAA 


300 


TAAATAAGGA 


ACCAACGATC 


ATTGGTAAAA 


TACCATAGCT 


TGCCGGAATG 


TTCGTTGGCG 


360 


ACCAATCACT 


GCCTAATAAA 


AAACGGGCAA 


AGCCGTAGTT 


AGCTATGAAA 


GGTAAGCCAT 


420 


TACTAAAAAT 


AAAGAAACAG 


ATTAGCAAAA 


TAGCTACAAC 


AGCTACTGTT 


GCACTCATGA 


480 


AAAAAATTGC 


CCTAAAAACT 


GCTTCTTTGA 


AGGCTTGTTT 


TGTCACATCT 


TGTCCTTTCT 


540 


AGTGAAGAAA 


GTAAGGGAGA 


T AC G AC AC C T 


CCCTACTTGC 


CTTCTTTATC 


TTATTGTACG 


600 


ATGAAACGTC 


TGCATCTCTT 


TAGAGATTTA 


TGGAGCAAAC 


ATTTTATTTA 


ATCTTGTCCC 


660 


AGGTGGTTAA 


TTTGCCACTA 


AAAACGTCCG 


CAAGTTCAGC 


CATACTGACT 


TGGCTTGCCT 


720 


TATTGTCATT 


ATTGACCACA 


ACAGCAATAC 


CGTCTAAAGC 


AATAGCATCA 


TGGGTGAGAC 


780 


TCTTACCTTC 


TTCAGGAGTT 


AATTCCCTAG 


AAACCATACC 


AATATCAGCG 


GTTTTCTCCT 


840 


TAACAGCGGT 


AATACCTGCT 


G AAG AC C CAT 


TAGAGGTAAT 


ATCAATCGTA 


ACTTCTGGAT 


900 


TTTCTTTTTT 


ATAAGCTTCT 


GCTAATTTTT 


CCATTAAAGA 


AGATACTGAA 


GTGGAACCTA 


960 


CAACAGACAA 


CTTGCCTGAT 


AAGTGTTGGC 


TTGTATATTC 


TGTGGTTTCG 


GTTTTAGCTT 


1020 


CAATAAATTT 


ATTATCTGTG 


ACCACTTGTT 


GACCTTGTTT 


GGAGTGGATA 


AAGCTGATAA 


1080 


AATCTTGACC 


TAGCTTGGAA 


AGATTAGAAG 


ACCAAACAAT 


GTTGAAGGGA 


CGTTGAAGAG 


1140 


GGTATTCACC 


ATCTAAAACT 


GTGTCTCGAC 


TAGCCTTGAC 


ACCATCAATC 


TCTAAAGCCT 


1200 


TGACAGATTT 


CGTTAAAGAT 


CCCAAGGAGA 


TGTAGCCGAT 


AGCATTAGCA 


TTCCCTTGAA 


1260 


CTGCTGAGAG 


AACACCTTCT 


GTACTATTTT 


GAATCACAGC 


TGTTTTGGCA 


GTGTAGTCAA 


1320 


TTTTTTTATC 


ACCGTCTTTT 


TTGAGAATCC 


CTGTGATTTC 


TGTGAAGGCA 


CCCCGTGTTC 


1380 


CAGAGCCATT 


TTCTCGTGAA 


ATCACCTCAA 


TCGTTCCTGG 


AGCTGACTGT 


TTGGAAGCAG 


1440 


CTGACTGATT 


GCCACAGGCA 


ACAAGCCCAA 


ATCCTGATAA 


GCCAATGGCT 


GCAAGAGTAA 


1500 


GCATTTTTTT 


GAATTTCATA 


ATAATCACCT 


TTATCTCTAT 


GTATTTTTCT 


TGTGTAGGCT 


1560 


TACTACATTT 


ATAGTCTAAC 


AAGTCTTTGT 


AAAGGTTTAT 


CCCTGATTCA 


TGTAAAGATT 


1620 


GTGTAAAGAA 


TCAAAAAAAG 


CCACTTTTGA 


AAAATGGCTG 


CCCCTAAAAA 


TAG 


1673 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1702 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 



CTTTTTTATT 


TCACAACAAG 


TTCATAACGT 


GTCTTACTGG 


TGAAGGTTTG 


ACCAGCTTTA 


60 


AGAATGACTT 


GGCCTTTAAG 


GTCACTGTGA 


ATGGCATCTG 


GTAAAGCTTG 


CGCTTCAAGA 


120 


GCAATCCCAT 


TGTGCTGTAG 


CATTGGCTGA 


CCTCCTATGA 


TGACACTTTC 


ATCCACAAAG 


180 


TTTGCTGTGT 


AG AC C AC AAA 


GCAAGGAGCT 


TCTGTCTTGA 


AAAGCAGGAA 


GCGACCTGAA 


240 


TTTTGGTCAT 


AAAGGAATCC 


AGCATTGTCA 


TGGCCTGCAG 


GAAGGGCAAA 


TGGATGATCC 


300 


AAACCTGATG 


CCAGCTGGAT 


TTGCTCATCT 


TCTTCTGCAA 


AGATATCCTT 


CAACAAGGCA 


360 


CCATTGTAGA 


TGTGTTTGAC 


CACATCACGG 


TTGGCTTCTG 


GAGTTTTGGC 


AGGAACACCG 


420 


TCAGGAGCGA 


TTGAGTAAAT 


GCCCTCTGTG 


TTTAGTTGGA 


AG AC ATG AC G 


GTCAATCGTC 


480 


TGCGTGAAAT 


CACCAGACAA 


GTTGAAATAG 


CTGTGGTTGG 


TTGGATTGAC 


CAGCGTATCC 


540 


TGATCGGTCG 


TTACCTTGTA 


GATCGAATTC 


ATGGAGGCAC 


CAGTTTCTTC 


CAAGTGATAA 


600 


CTGATCGCCA 


AATCTTGAGA 


TTTCCAGGGA 


ACCCTCCTGT 


CCCATCTGTA 


CGCTCTGTGT 


660 


AGAGAGTCAA 


GCCATGATCG 


CTTACTTCTT 


CAACTTCAAA 


CAAGCTGGAA 


TCCCAACCAG 


720 


TTGAACCACT 


GTGATTACAG 


TTGCTAGCAT 


TATTAACCTC 


AAGGTCATAG 


GTCTTACCAT 


780 


TGAGCTCAAA 


GGTCGCACCT 


GCAATACGAC 


CCGCTACAGG 


ACCTACACTT 


GCTCCATGCT 


840 


TGGGACTATT 


GCCTACATAA 


CTATCAAAGT 


CATCAAATCC 


CAAGATAACA 


TTGGCAAAAT 


900 


TTCCAGCCTT 


GTCAGGTGCG 


ACATAGCGCA 


AGATAGTCGC 


ACCATAAGTC 


ATAACCTCAA 


960 


GTTGGTAGCC 


ACCGTCTGTC 


TCAAATCGAT 


AGGCCAAGAC 


ATCCTCACCC 


TCAACATTTC 


1020 


CAAATACACG 


CTCTGTGTAT 


GCTTTCATTC 


TGTTCTCCTT 


TTACTATTTC 


TCTCAAGCAA 


1080 


ACAAACCATA 


GAAAGCGTAC 


TGACAATCTA 


TGGTTTATCT 


GATAATTTAC 


AAATCCTCTT 


1140 


GTCAAGAATT 


CATAAACACT 


GTCTTACTTT 


TGATATTCGT 


GAATTATGAC 


ACCTTGTACT 


1200 


ACACGGTTTA 


CTGTACCTGT 


AGGAGACGGT 


GTATCTGGTT 


TATTTTCTAC 


CTTGAGTGAA 


1260 


GTCAATAGGG 


CAAAGAGTTG 


GGCATAAACG 


ATGTAAGGGA 


AGACACGGTA 


AATATCATTC 


1320 


AAGACACCGC 


CACAACCAAG 


GGCCACTTCT 


TTGACATTTT 


CAAGACCAAA 


AGCTTGATCA 


1380 


CTCAAAAGCA 


CAACACGACG 


AGCAATCTGG 


TCACCAGCAA 


CTTCACGAAC 


CAAGTCCAAG 


1440 


TCGTACTTAC 


GAGTGTAGTC 


CGTCGTTGTA 


CCAAAGACCA 


AAACAACTGT 


ATTGTCGTTG 


1500 


ATAAGAGATT 


TTGGACCGTG 


ACGGAAGCCA 


ACTGGGCTTT 


CATACATGGT 


CGCAACTTGA 


1560 


CCAGCAGTTA 


ATTCCAAAAT 


CTTGAGCTGA 


GCTTCATGAG 


CAAGTCCAAA 


G AAAGG AC C A 


1620 


GCGCCTAGAA 


TAGATGACAC 


GGTTAAAGTC 


TAAATCAACG 


AGATCTTTGA 


CATCTTCTGC 


1680 


CTTGTCTAAA 


ACTTTACGGG 


CA 








1702 
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(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1940 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 



TGCAGGATTT 


GATTTGGACG 


ACTTTTATTA 


TTACCAGATT 


CGCCTAGGAA 


TAGAAAAAAG 


60 


AGCCCAAGAG 


TTGGACTATG 


ATATCTTGCG 


CTATTTTAAT 


GACCACCCTT 


TTACCCTAAG 


120 


CGAGGAAGTG 


ATTGGGATTC 


TCTGCATCGG 


AAAGTTTAGT 


CGAGCTCAGA 


TTTCTGCCTT 


180 


TGAAGAATAC 


CAAAAGCCTC 


TTGTATTTCT 


AGACAGCGAT 


ACACTTTCCC 


TGGGACATAC 


240 


CTGTATTATC 


ACGGATTTTT 


AC AC TGCTAT 


GAAACAGGTT 


GTCGATTATT 


TCCTCAGTCA 


300 


AGGAATGGAC 


CGTATCGGGA 


TTCTAACAGG 


CCTTGAAGAA 


ACAACAGACC 


AAGAAGAAAT 


360 


CATTCAGGAC 


AAGCGTCTAG 


AAAACTTCAA 


AAACTACAGT 


CAAGCGAGGG 


GAATCTATCA 


420 


TGATGAACTG 


GTCTTTCAAG 


GAAGATTTAC 


TGCCCAGTCT 


GGCTATGACT 


TAATGAAGGA 


480 


GGCCATTCAG 


AGCTTGGGAG 


ACCAACTTCC 


GCCAGCATTT 


TTCGCAGCCA 


GCGATAGTTT 


540 


AGCTATCGGT 


GCCCTCCGTG 


CCCTCCAAGA 


AGCTGGAATC 


AGCCTGCCAG 


ATCGCGTCAG 


600 


CCTCATTTCC 


TTTAACGACA 


CTAGTCTGAC 


CAAACAGGTC 


TATCCTCCCC 


TCTCTAGTAT 


660 


TACAGTTTAT 


AC TG AAG AAA 


TGGGCCGAGC 


AGGTATGGAT 


ATTCTTAACA 


AGGAAGTCCT 


720 


CCACGGTCGG 


AAAATCCCTA 


GCCTGACCAT 


GCTGGGAACC 


AGACTGACAT 


TAAGAGAAAG 


780 


TACCCTAAAT 


CAAGAATAGG 


ATAACATAAA 


AAACGAATAG 


AGTTCTAAAA 


CTCCTATTCG 


840 


TTTTTTATTC 


GATTACAATC 


AT AG AC TT AA 


TGGTCTTACG 


TTCATCCATA 


TCTTTGTAGG 


900 


CTTGGTCGAT 


ATCTTCCAGT 


TTATAACTTG 


AAGTAAAGAC 


GCGACCTGGA 


TTGATATCAC 


960 


CATCAAGGAC 


GGCTTTTAGT 


AAAAATTGCT 


TATCGTATGT 


TGTAGCAGAA 


GCTGCCCCAC 


1020 


CTGCTACAGA 


GATATTTTGC 


AT AAATGTC G 


AACCAAGAGC 


ACGATTATTA 


TAGTGTGGGA 


1080 


CTCCTACAAA 


GCCCATACGC 


CCTCCATTAT 


G AAG AAC AC C 


TAGCGCCTGT 


TCTATAGCAG 


1140 


CCTCCGTACC 


AACACATTCA 


AGTGCTGCGT 


CTGCTCCTCC 


GCCGAGGATT 


TCACGCACCT 


1200 


TGGTAATTCC 


TTCTTGACCA 


CGTTCTGCAA 


CAACAGCTGT 


CGCACCTGAC 


TCCATAGCCA 


1260 


TCTTTTGACG 


GTCTTCATGA 


CGGCTCATAA 


GGATAATTTG 


TGATGCTCCA 


CGCATCTTAG 


1320 


CCGCGATGAC 


AGCACATTGA 


CCAACAGCCC 


CATCACCGAT 


AAC AAC AAC C 


TTGTCCCCTT 


1380 


TTTGAACATT 


TGCAACACGC 


GCCGCATGAT 


AGCCTGTCGG 


CATGACATCT 


GCAAGAGTCA 


1440 


AAAGGGACTT 


GAGCATCCCT 


TCTGTATAGT 


CAGAAGGTTG 


ACCAGGGATT 


TTAACCAGCG 


1500 


CCCAGTTTGC 


ATAGTGGAAG 


CGAATATATT 


CTGCCTGAAA 


ATCACCCCCC 


AAATTATTGC 


1560 


CAATATGATT 


GTCGCAAGAA 


CCGTCAAATC 


CAGCAAGACA 


GGCATCACAC 


TCACCACATC 


1620 


CATGTGTAAA 


AGGGACAATC 


ACAAAATCAC 


CTGGTTTCAC 


CGTCGTAATG 


GCTTCCCCAG 


1680 


CTTCTTCAAC 


AATCCCAATC 


GCTTCGTGTC 


CACTTATTTT 


TTGTGTCCAA 


CTTTCGTTTT 


1740 


CCNTGGATTA 


CGGTACCTCC 


ATAAATTTGA 


ACCACAAACG 


C AC GC AC G AA 


CCACACGAAT 


1800 


AATCACATCA 


TCCGCTTCTA 


TTATTTGCGG 


ACGTTCAATG 


CTAGCAAGTC 


CAACCTGACC 


1860 


TGCCTTTGTA 


TATACTGCTG 


ATTTCATTTA 


AAATTTTCCT 


TCCTTATAAA 


GTTTAATTTT 


1920 


GAGATTTAAA 


CGATTTAAAG 










1940 
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(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2051 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 



ATCGAATTTT 


TCTAGCCAGG 


CTACAGTTTT 


GGCAAGTAAG 


GTTTCATCTC 


AGGCAGTCAA 


60 


CTGGGTGAGT 


GCCTTTATTA 


GCGGAGCTTC 


TCAAGTGATT 


GTTGCCTTGA 


TTATCGTTCC 


120 


TTTCATGCTC 


TTTTATCTCT 


TGCGTGATGG 


GAAAGGCTTG 


CGTAACTATT 


TGACCCAATT 


180 


CATTCCAAGA 


AAATTGAAGG 


AACCTGTTGG 


ACAAGTTCTA 


TCAGATGTGA 


ATCAACAGTT 


240 


GTCCAACTAT 


GTTCGAGGGC 


AAGTGACAGT 


GGCTATTATT 


GTAGCAGTAA 


TGTTTATCAT 


300 


CTTCTTCAAG 


ATTATTGGTC 


TACGCTATGC 


GGTTACGCTG 


GGGGTTACTG 


CTGGTATTTT 


360 


AAATCTGGTC 


CCTTATCTTG 


GTAGCTTTCT 


AGCCATGCTT 


CCTGCCCTAG 


TATTGGGTTT 


420 


GATTGCTGGT 


CCAGTCATGC 


TTTTGAAAGT 


AGTGATTGTC 


TTTATTGTAG 


AACAAACTAT 


480 


TGAAGGCCGT 


TTTGTCTCTC 


CATTGATTTT 


GGGAAGTCAA 


TTAAACATCC 


AC C C TATTAA 


540 


TGTTCTCTTT 


GTTTTGTTAA 


CTTCAGGATC 


TATGTTTGGT 


ATCTGGGGAG 


TTTTACTTGG 


600 


TATTCCGGTT 


TATGCCTCTG 


CTAAGGTTGT 


CATTTCAGCC 


ATTTTCGAAT 


GGTATAAGGT 


660 


AGTCAGTGGT 


CTATATGAAT 


TAGAGGGTGA 


GGAAGTCAAG 


AGTGAACAAT 


AGTCAACAGA 


720 


TGTTACAGGC 


TTTGGAGGAG 


CAAGATTTAA 


CTAAGGCTGA 


GCATTATTTC 


GCCAAAGCTT 


780 


TAGAAAATGA 


TTCAAGTGAT 


CTTCTGTATG 


AGTTGGCAAC 


TTATCTTGAA 


GGGATTGGTT 


840 


TCTATCCTCA 


GGCCAAGGAA 


ATTTACCTGA 


AAATTGTAGA 


AGAATTTCCA 


GAGGTTCATC 


900 


TTAATCTAGC 


TGCAATGGCT 


AGCGAGGATG 


GTCAAATAGA 


AAAAGCCTTT 


AACTATCTTG 


960 


AGGAAATCCA 


AGCTGACAGT 


GACTGGTATG 


TCTCGCTCTT 


TGGCTCTGAA 


GGCAGACCTA 


1020 


TACCAGCTGG 


AAGGTTTGAC 


AGATGTGGCA 


CGTGAGAAAT 


TATTGGAGGC 


CTTGACCTAC 


1080 


TCAAAGGATT 


CTCTCTTGAT 


ATTGGGTTTG 


GCAAAGTTGG 


ATAGTGAGTT 


GGAAAATTAC 


1140 


CAAGCGGCTA 


TTCAAGCCTA 


TGCCCAGTTA 


GATAATCGCT 


CGATTTATGA 


GCAAACGGGC 


1200 


ATTTCCACCT 


ATCAACGAAT 


TGGCTTTGCC 


TATGCTCAGT 


TAGGGAAATT 


TGAAACGGCT 


1260 


ACTGAGTTTT 


TAGAAAAAGC 


CCTGGAGTTA 


GAATACGATG 


ACTTAACAGC 


TTTTGAGTTG 


1320 


GCCAGTCTTT 


ATTTTGATCA 


AGAAGAATAT 


CAAAAAGCCA 


CCCTCTACTT 


TAAGCAGCTT 


1380 


GATACCATTT 


CTCCTGACTT 


TGAAGGCTAT 


GAGTATGGGT 


ACAGTCAGGC 


TTTACATAAG 


1440 


GAACATCAAG 


TTCAAGAAGC 


CCTGCGTATC 


GCTAAGCAAG 


GATTAGAGAA 


AAATCCCTTT 


1500 


GAAACTCGCC 


TCTTGCTAGC 


TGCTTCACAA 


TTTTCTTATG 


AATTGCATGA 


TGCTAGTGGT 


1560 


GCAGAAAATT 


ATCTCCTTAC 


TGCAAAAGAA 


GACGCTGAGG 


ATACAGAAGA 


AATCTTGCTT 


1620 


CGTTTAGCCA 


CTATTTATCT 


GGAGCAGGAG 


CGTTATGAGG 


ATATTCTAGA 


CTTGCAGAGT 


1680 


GAGGAGCCAG 


AAAATCTTTT 


GACCAAGTGG 


ATGATTGCTC 


GTTCTTATCA 


AGAAATGGAC 


1740 


GATTTGGATA 


CTGCTTATGA 


GCATTATCAA 


GAGTTGACAG 


GAGATTTGAA 


GGACAATCCA 


1800 


GAATTTCTGG 


AACACTATAT 


CTATCTCTTG 


CGTGAATTGG 


GACATTTTGA 


AGAAGCAAAA 


1860 


GTCCATGCTC 


ACACTTACTT 


AAAACTGGTT 


CCAGATGATG 


TGCAAATGCA 


AGAACTGTTT 


1920 
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GAGAGATTGT AAGAATGTTT AAACATATAG AACTGTAGTT TATCTCTTTT GATAGCTACG 19 80 
GTCTTTATTT GTACATGGTA GAATCTTTTT ACAAAAATAC TTGGTAATCT TGTTTATTCA 2 040 
TGCCATAATA G 2 051 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1318 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 



CTTTAGCAAT 


CAGTTTATTG 


GGAGATTTGA 


CTGCCACTTC 


TGTTGGAACC 


TTGATAATCT 


60 


TTTTACCCTC 


AAAGCGTTCC 


ATACCAGAAA 


TCTTAACATC 


AACTGCTAAA 


ATAACTACAT 


120 


CCGCTGCATC 


AATCTGCTCT 


TGACTCAATT 


CATTTTCTAC 


CCCTATTGTC 


CCCTGAGTCT 


180 


CAACATGAAT 


C AC ATGTC C A 


GCTACCTTTG 


CGGCATTCTC 


TAATTTTTCC 


TGTGCAATAT 


240 


AAGTGTGGGC 


AATTCCCATA 


GTACAAGCTG 


CAACACCAAC 


AATTTTCATA 


CGGATACCCT 


300 


CCAAAATTTT 


TTCTTATTAA 


CAAAAAGCTG 


CAATCACATC 


ATCAGATGTC 


TGAGCCCGAA 


360 


CTAATTTGGC 


AACAACTTCG 


TCATTACCAA 


GTTTTCGAGC 


AAAGAGTGAT 


AAGGTCTTCA 


420 


AATGCTCCCT 


AGCAGCTTCT 


GTATCATCAC 


CAACTGCAAA 


GAGTACAATT 


ACTTTGACCC 


480 


CTTTCCCATC 


AATGGTCTCC 


CAAGGAATCT 


CATTGTGATT 


TATAGCTATG 


ACTACCCCCG 


540 


CCTTCTCCAC 


AGCAGAACTC 


TAGCTATGGG 


GAATAGCAAT 


ATAATTCCCA 


ATACCGGTCT 


600 


GTCCTTCTGC 


CTCTCTCTGA 


TAAAGACCTT 


C G AT AAATTG 


GTCTCTATCA 


GACACATAAC 


660 


CCGTCTCAAC 


CAATAGTATG 


AGCTAATGCC 


TCAAAAACCT 


CTTCTTTGCT 


CTGCATCTGT 


720 


AAATCCGTCT 


GGATCAGACT 


CACATTAAGA 


ATATCTTTGA 


TTTCCATATA 


TTATCTCCCG 


780 


TAATTCTTCT 


TTTGTTAACT 


GTTTTAATTG 


ATTTATGAAT 


GATTCATCTG 


CTAGTCTTCT 


840 


CATCAATGTT 


TTAATACATG 


ACTTGTCCTG 


TGATACTGCA 


ATGGCCAAAC 


C GAT AATAAG 


900 


GTCAACACAC 


TGGATATCCT 


TCGACCATTC 


TCTGATAGGT 


GGTTTTAATC 


TAGTAATCAC 


960 


TAAGACATGA 


TGTTGAAAGT 


TTCCTTCACA 


ATGTGGTAGA 


AGAACACCTT 


TAGCAACCTC 


1020 


TATACTTCCC 


TGTCTCTCAC 


GGTAATATAG 


AAGCTCTTCT 


ATTTTTTCTG 


TATCTTCAGA 


1080 


AACAAGAAGG 


CTGATTTGAT 


TTGCTAATTC 


TTTGTAGGCT 


TCTTGACGAT 


TTTGAACAGA 


1140 


TATATCCATA 


AGGACAAGCG 


AAAGATTATT 


CATAGTTTAT 


CTCCTGAATT 


TTTGCTTGAA 


1200 


GACGTTGTTT 


ATCACCCTCG 


GTTAGAAAAG 


CACTAACTAG 


GACAAACGGG 


ACACTTGCTG 


1260 


GTTCCTGCAA 


AGCTACCGTC 


GTCACAATGA 


AATCTAAATC 


TGGATATAGA 


TTTATCAG 


1318 



(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2077 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 



CTAGTCTTGG 


CTACTGTCTA 


AGTTGGCTTG 


TGCATAAGCC 


TGCCAGATTT 


TTTGTTGGGG 


60 


TTTGGCAAGT 


GGGTAATTCT 


TGAATTCTTC 


TGGTGAAAGC 


CAACGAACTT 


CCCTATCTGA 


120 


AAAATCATGG 


AAGTCACTCA 


CCTGACCTGC 


TACAATCTGT 


ACATGCCATT 


TTCGATGACT 


180 


AAAAACATGC 


TGG AC TGT AT 


CAAAACAAAC 


ATCAAGCCAA 


TCAACATCTA 


GGTCATAGTC 


240 


CTGCTGGAAA 


CTCTCTTCTG 


GGACTGGGGC 


CAGAGTTCAC 


ACTTTCTTCC 


GCAACCTGAT 


300 


GAAAGAGGTC 


AAACTGCTCT 


TCTTGCGAAA 


AGTTATCAAC 


TTCTATAAAG 


GGG AAATGCC 


360 


AAAAACCTGC 


CAAGAGCTTT 


TCGCTTTCAT 


TTTTTTCAAG 


TAAAAATTGT 


CCTTGAGAAT 


420 


TTTTCACAAC 


TAAGGCTTTA 


AGATAAATAG 


GAACCGGCTT 


TTTCTTAGGA 


GATTTAATTG 


480 


GATAACGGTC 


CATGGTTCCA 


TTCTGATATG 


CCGCACTAAA 


GTCCTTGACT 


GGGCTTTCTT 


540 


CAGGTCTGGG 


ATTTACAGGA 


GACTCAATAT 


CAGACCCTAA 


GTCCATCAAG 


GCTTGATTAA 


600 


AATCACCCGG 


ACGATCTGGA 


TTAATCAAGA 


TCTCCATCAT 


TGCCTGAAAA 


ATTTTTCGAT 


660 


TACTTGGAAT 


CCCAATATCG 


TGGTTGACTT 


CAAACAGACG 


CGCCAAGACC 


CGCATGACAT 


720 


TACCATCTAC 


AGCTGGCTCA 


GGCAAGTTAA 


AAGCAATACT 


GGAAATGGCT 


CCTGCTGTGT 


780 


AAGGTCCAAT 


CCCTTTCAAG 


CTGGAAATTC 


CTTCATAGGT 


ATTTGGAAAT 


TGGCCACCAA 


840 


AGTCAGTCAT 


AATCTGCTGG 


GCTGCAGCCT 


GCATATTGCG 


AACTCGAGAA 


TAATAACCCA 


900 


AGCCCTCCCA 


AGCTTTCAGT 


AAACTCTCCT 


CAGGCGCAGT 


TGCCAGACTT 


TCGACAGTTG 


960 


GAAACCAGTC 


CAAAAATCTT 


TCGTAGTAAG 


GGATAACTGT 


ATCCACCCTG 


GTCTGCTGAA 


1020 


GCATGATTTC 


AG AT AC C C AG 


ATGTGATAAG 


GATTTTTACT 


TCTCCTCCAA 


GGCAAATCTC 


1080 


TTTTGTTTTC 


ATCATACCAA 


GCGAGAAGTT 


TTCTCACCGG 


AAAGAAATGA 


CTTTCTCCTC 


1140 


CGGCCACATG 


ACGATACCGT 


ATTCTTTCAA 


ATCCTAACAT 


ATCTCTAGTT 


ATAACACAGA 


1200 


AGGTTTCACC 


TGTCTTTGTA 


TCTGATTTAT 


AATATTTTCA 


ATAGATAGTA 


TATAACTTTT 


1260 


CCTATCTACT 


TATACTCCAA 


TGAAAATCCA 


AAGAGCAAAC 


TAAGAAGCTA 


GCCGCAGGTT 


1320 


GCTCAAAACA 


CTGTTTTGAG 


GTTGTGGATA 


GAACTGACAG 


AGTCAGTATC 


ATATTACCTA 


1380 


CGGCAAGGTG 


AAGCTGACGT 


AGTTTGAAAA 


GATTTTCGAA 


GAGTATAAAT 


CTTATTGATG 


1440 


AACTGCTTGC 


AGTCTGAGAA 


AAAATGAGCT 


TGGATATTAT 


TTCCAAACTC 


ACTTAAAGTC 


1500 


AATTTCAATC 


CACTAGAACA 


AGCCTAGTAC 


AGTTCCATCG 


CTTTCAACAT 


CCATGTTGAG 


1560 


AGCTGCTGGA 


CGTTTTGGAA 


G AC C TGGC AT 


GGTCATAACA 


TCACCAGTTA 


AGGCAACGAT 


1620 


GAAGCCTGCA 


CCTAATTTTG 


GTACCAATTC 


ACGAATGGTA 


ATTTCAAAGT 


TTTCTGGTGC 


1680 


TCCAAGCGCA 


TTTGGATTGT 


CTGAGAAACT 


GTATTGAGTT 


TTAGCCATAC 


AAATTGGCAA 


1740 


TTTGTCCCAA 


CCGTTTTGAA 


CGATTTGAGC 


AATTTGTGTT 


TGAGCTTTCT 


TCTCAAAGTT 


1800 


CACTTTGCTA 


CCACGATAGA 


TTTCAGTGAC 


AATTTTTTCA 


ATCTTTTCTT 


GGACAGAAAG 


1860 


GTCATTATCG 


TACAAACGTT 


TATAGTTAGC 


TGGATTTTCA 


GCAATTGTCT 


TAACAACTGT 


1920 


TTCGGCAAGT 


GCTACTCCAC 


CTTCTGCTCC 


ATCAGCCCAG 


ACACTAGCCA 


ATTCAACTGG 


1980 


TACATCGATT 


GAGGCACAGA 


GTTCTTTTAA 


GGCTGCAATT 


TCAGCTTCTG 


TATCAGATAC 


2040 


AAATTCGTTA 


ATAGATACAA 


GCTAATGGAA 


TACCGAA 






2077 



(2) INFORMATION FOR SEQ ID NO: 64: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1887 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 



CTCAAAACNC 


TGCTTTGAAG 


AGATTTTCAA 


AGAGTACAAG 


AAGTTTAGTT 


ATTAGCGTTC 


60 


TTACCGCTTG 


T AAAC TAG AT 


TTCTCATAAA 


ATAGAATCTT 


TTCCTTTTAG 


TTGTAAACTA 


120 


GTCTGGGAGA 


GTAGAGAGGT 


TTG AG AT AC C 


TTTCTAGCTT 


TTGGATTATC 


ATCTAAGAAG 


180 


AGTAATTTCC 


CTTGCATTAA 


AAAGGGGAAA 


AAG AG AC AC G 


AAATGACTAT 


AATGGGTGAC 


240 


AATGGGGGAA 


GGGATAGACA 


AGAGATTTTA 


TCCACATATG 


AAAAAAGGAG 


GTTAGGAAAG 


300 


AGTTATATAT 


CC TAT ATT AT 


ATAAATAATC 


AATTGCGCAG 


AAATTTGGTA 


AGAATTCATG 


360 


CGTCAACTCA 


TAAAGAACTA 


CTTAAAAAAT 


TCACAGTATT 


CATAATTATT 


TTCGAGGAGA 


420 


AAAAC AG T G A 


AAAAAAGAAA 


AAAGCTTGCT 


CTGTCTCTTA 


TCGCTTTTTG 


GCTGACGGCT 


480 


TGTTTAGTAG 


GCTGTGCTAG 


CTGGATTGAT 


CGTGGAGAAT 


CCATAACGGC 


TGTTGGCTCA 


540 


ACTGCCTTGC 


AAC CCTTGGT 


TGAAGTAGCG 


GCAGATGAAT 


TTGGCACCAT 


CCATGTTGGA 


600 


AAAACGGTCA 


ATGTCCAAGG 


GGGAAGTTCT 


GGTACAGGCT 


TGTCCCAGGT 


TCAGTCTGGG 


660 


GCAGTTGATA 


TAGGAAACTC 


AGATGTATTT 


GCTGAGGAAA 


AAGACGGAAT 


TGATGCTTCT 


720 


GCTCTTGTTG 


ACCACAAGGT 


CGCGGTAGCT 


GGCTTGGCTC 


TGATTGTCAA 


TAAGGAGGTT 


780 


GATGTTGATA 


ACCTAACGAC 


AGAGCAACTT 


CGTCAAATCT 


TCATAGGTGA 


GGTAACCAAT 


840 


TGGAAAGAGG 


TTGGTGGTAA 


GGACTTACCC 


ATCTCTGTTA 


TCAATCGGGC 


AGCCGGCTCT 


900 


GGCTCTCGTG 


CTACCTTTGA 


TACTGTCATT 


ATGGAAGGTC 


AGTCTGCCAT 


GCAAAGTCAG 


960 


GAGCAGGATT 


CAAATGGAGC 


GGTAAAATCA 


ATCGTATCAA 


AAAGTCCAGG 


AGCTATCTCT 


1020 


TATTTATCTC 


TT AC C TAT AT 


AG ATGATTC G 


GTCAAAAGCA 


TGAAGTTGAA 


TGGCTATGAC 


1080 


TTAAGTCCAG 


AAAATATAAG 


TAGCAATAAT 


TGGCCCTTGT 


GGTCTTATGA 


GCATATGTAT 


1140 


ACATTGGGGC 


AGCCCAATGA 


GTTGGCTGCA 


GAATTTCTCA 


ATTTTGTTCT 


CTCGGATGAG 


1200 


ACCCAAGAAG 


GGATTGTCAA 


AGGATTGAAG 


TATATTCCGA 


TTAAGGAAAT 


GAAGGTTGAA 


1260 


AAAGATGCTG 


CCGGAACTGT 


GACAGTGTTG 


GAAGGGAGAC 


AATAATGAAT 


CAAGAAGAAT 


1320 


TAGC TAAGAA 


AATGTTGCTT 


CCATCAAAGA 


ATTCTCGTCT 


GGAGAAATTA 


GGAAAAGGTT 


1380 


TGACCTTTGC 


CTGTCTTTCT 


TTGATAGTCA 


TCCTTGTGGC 


CATGATTTTG 


GTTTTCGTAG 


1440 


CGCAAAAAGG 


CTTGTCGACC 


TTCTTTGTCA 


ATGGTGTGAA 


TATCTTTGAC 


TTTCTTTTGG 


1500 


GAGGAACTTG 


GAATCCTTCT 


AGTAAAGAAT 


TTGGTGCCCT 


TCCTATGATT 


TTGGGTTCCT 


1560 


TTATCGTTAC 


CATTCTCTCA 


GCCCTTATCG 


CAACACCCTT 


TGCTATTGGT 


GCAGCAGTTT 


1620 


TTATGACCGA 


AGTATCACCA 


AAAGGGGCGA 


AGATTTTGCA 


ACCAGCTATT 


GAACTCCTGG 


1680 


TTGGGATTCC 


TTCAGTAGTG 


T AC GGATTT A 


TTGGCTTGCA 


AGTCGTCGTT 


CCCTTTGTTC 


1740 


GCAGTGTCTT 


TGGTGGGACT 


GGTTTTGGGA 


TTTTGTCAGG 


GATTTCCGTC 


CTCTTTGTCA 


1800 


TGATTTTGCC 


GACCGTAACC 


TTTATGACAA 


CGGATAGCTT 


GCGTGCGGTT 


CCTCCNTTAT 


1860 


TATCGTGAAG 


CCAGTTTCGC 


TATGGGA 








1887 
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(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 405 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 



CTGAGGAATC 


AAAAGTTGAA 


CCACCAGTAG 


AACAAGCATA 


AGTCCCAGAA 


CAACCCGTGC 


60 


AACCTACACA 


AGCTGAGCAA 


C C AAGTAC AC 


CAAAAGAATC 


ATCACAACAA 


G AAAATC C T A 


120 


AAGAAGATAG 


GGGAGCGGAA 


GAGACTCCGA 


AACAAGAAGA 


TGAACAGCCA 


GCAGAAGCCC 


180 


AAGAAATCAA 


GGTTGAAGAA 


CCAGTAGAAT 


CTATAGAGGA 


GACTGTCATT 


CAACCTGTTG 


240 


AACAACCAAA 


AGTGGAAACG 


CCTGCTGTTT 


AATAACTAAC 


GGAACCTACA 


GAGGAACCTA 


300 


AAGTTGAAGT 


AACTAGTATT 


CCCCTCACTA 


CTCGCTATGA 


GGAAG AC CTT 


ACTTACGAAC 


360 


ACGGAACGCG 


TTGAAGTTGT 


TAAGGAAGGT 


TATAATTGGC 


AGTAT 




405 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1542 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 



CTATGGGATT 


GGTAGTTCTT 


CCTAGTGCAG 


GGGCTGTAGA 


CCCAGTTGCG 


ACCCTAGCGC 


60 


TGGACTAGTC 


GAGAGGGTGT 


TGTTGAAAAT 


GGATGGCTAT 


CGCTATGTTG 


GTTATCTATC 


120 


AGGTGACATC 


CTCAAAACGC 


TTGGCTTGGA 


CACTGTTTTA 


GAAGAAACCT 


CAGCAAAACC 


180 


TGGAGAGGTG 


ACTGTAGTCG 


AAGTTGAGAC 


TCCTCAATCA 


ACAACAAATC 


AGGAGCAAGC 


240 


TAGGACAGAA 


AACCAAGTAG 


TAGAGACAGA 


GGAAGCTCCA 


AAAGAAGAAG 


CACCTAAAAC 


300 


AGAAGAAAGT 


CCAAAGGAAG 


AACCAAAATC 


GGAGGTAAAA 


CCTACTGACG 


ACACCCTTCC 


360 


TAAAGTAGAA 


GAGGGGAAAG 


AAGATTCAGC 


AGAACCATCT 


CCAGTTGAAG 


AAGTAGGTGG 


420 


AGAAGTTGAG 


TCAAAACCAG 


AGGAAAAAGT 


AGCAGTTAAG 


CCAGAAAGTC 


AACCATCAGA 


480 


CAAACCAGCT 


GAGGAATCAA 


AAGTTGAACC 


ACCAGTAGAA 


CAAGCAAAAG 


TCCCAGAACA 


540 


ACCCGTGCAA 


CCTACACAAG 


CTGAGCAACC 


AAGTACACCA 


AAAGAATCAT 


CACAACAAGA 


600 


AAATC C T AAA 


GAAGATAGGG 


GAGCGGAAGA 


G AC AC C G AAA 


CAAGAAGATG 


AACAGCCAGC 


660 


AGAAGCCCAA 


GAAATCAAGG 


TTGAAGAACC 


AGTAGAATCA 


AAAGAGGAGA 


CTGTTAATCA 


720 


ACCTGTTGAA 


CAACCAAAAG 


TGGAAACGCC 


TGCTGTAGAA 


AAACAAACGG 


AACCAACAGA 


780 
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GGAACCAAAA 


GTTGAAGTAA 


CAAGTATTCC 


CCAAACTACT 


CGCTATGAGG 


AAGACCTTAC 


840 


TAAGGAACAC 


GGAACGCGTG 


AAGTTGTTAA 


GGAAGGTAAG 


AATGGCAGTA 


GAACAGTTAC 


900 


TACTCCATAT 


ATCTTGAATG 


CGACAGATGG 


TACGACTACA 


GAAGGC AC TT 


CGACAACTGA 


960 


TGAAGCTGAG 


ATGGAGAAAG 


AGGTTGTTCG 


TGTTGGCACG 


AAACCCAAAG 


AAAAATT AG C 


1020 


TCCAGTCTTA 


AGTTTGACAA 


GTGTTACAGA 


TAATGCAATG 


TTGCGTAGTG 


CGAGACTTAC 


1080 


TTATCATTTG 


GAAAATACAG 


ATAGTGTTGA 


TGTGAAAAAA 


ATTCATGCTG 


AAATTAAAAA 


1140 


TGGCGATAAG 


GTTGTCAAAA 


CTATTGACTT 


ATC TAAAGAG 


AGATTATCAG 


ATGCTGTTGA 


1200 


CGGTCTTGAA 


CTTTATAAAG 


ATTATAAGAT 


TGTGACGAGT 


ATGACCTATG 


ATAGAGGTAA 


1260 


TGGTGAAGAA 


ACCTCTACGT 


TGGAAGAAAC 


TCCACTACGA 


TTAGACCTCA 


AGAAGGTTGA 


1320 


ATTGAAAAAC 


ATCGGCTCTA 


CTAATCTCGT 


CAAAGTAAAT 


GAGGATGGTA 


CTGAGGTGGC 


1380 


AAGTGACTTC 


TTAACAAGTA 


AACCTGTGGA 


TGTGCAGAAT 


TACTACCTCA 


AAGTAACTTC 


1440 


CCGTGATAAT 


AAAGTTGTTT 


CCCCTCCCAG 


TTGAAAAAAT 


TGAAGAGGTG 


ACTGAGGAAG 


1500 


GTCCACCACT 


TTACAAAGTC 


CCTGCTAAGG 


CCCTAATTTG 


AT 




1542 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 



ATCGAATTAC 


TTCAACTCCA 


ACTTTACTCT 


CAATAAAAAT 


CAAATGTAAA 


AAGAGGAGCT 


60 


AAATTTATCT 


TTTTCTCCTC 


CTTCATCGTT 


CTTACTTTTG 


ACCATAATAA 


GCATTTGGTC 


120 


CATGTTTACG 


TTGGTAGTGT 


TTTTCTAGTA 


TGTACTGGGG 


AGCAGGTTCA 


ACTCTTGGAT 


180 


TGATTTGTTC 


TGTAAAGCGA 


TTCATCTTTG 


ATACTTCCTC 


TAGTACGACA 


GAGTGATAAA 


240 


CAGCATTCTC 


TGGATTTTTG 


CCCCAGGTGA 


ATGGACCGTG 


ATTGCGTACA 


ACAATTCCTG 


300 


GTACTTCAAC 


CGGGTTAAGT 


CCGCGATGTT 


CAAACTCTTC 


TACGATAACC 


AGGCCAGTAT 


360 


CTTTTTCATA 


GGCCACTTCT 


ACTTCGTCCT 


TGGTCAAACT 


ACGGGCGCAA 


GGGATTGAAC 


420 


CGTAGAAATA 


ATCTGCATGG 


GTTGTTCCGT 


AGAAAGGAAT 


ATCACGACCT 


GCCTGAGCCC 


480 


AAGCAACAGC 


TTCTGTCGAA 


TGGGTGTGAA 


CCACACTACC 


AATTTCTGAC 


CAAGCCTTAT 


540 


ATAATTGCAC 


ATGAGTTGGG 


AAGTCGGAAG 


ATGGTCTTAA 


ATCCCCTTAT 


AGGATCTTAC 


600 


CATCTAGATC 


AGTCACTACC 


ATGTTTTCAG 


GTGTCAATTC 


GTCATAATCC 


ACGCCTGATG 


660 


GTTTGATAAC 


AATG AC AC C G 


AGTTCGCGAT 


TGACTTCAGA 


TACATTCCCC 


CAGGTAAATT 


720 


TGACAAGTCC 


ATGTTTTGGC 


AATGATTGAT 


TGGCATCACA 


GACTCGTTTA 


CGCATAGCAT 


780 


TGATTACTTG 


ATTCATCTTA 


CATCAAACCT 


GCTTTCTTAA 


TGAGTGGATA 


G AG AAAAGC T 


840 


TGCGCCTCTT 


GAATGGCTGC 


GCGTGTTTCT 


TCTACTGTTT 


CACAATTTTC 


AGACCACATT 


900 


TCGATTAGGA 


AAGGTCCATT 


ATAATTGGTT 


TCCTTTAAAA 


TATCGAAAGC 


TTCTTCCCAT 


960 


TTGACACAAC 


CTTGCCCAAA 


AGGTACATCT 


CGGAACTGGC 


CCTTTGAACT 


TTCTGTCACT 


1020 


GCATAAGTAT 


CCTTGAGATG 


GAGAGTTGCG 


ATGGCATGAT 


GACCAAGATA 


AAACTCACTA 


1080 
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TAGATATCAT TATGCCATGC AGACACATTA CCAATATCTG GATATACAAA GAGGAAGGGA 1140 

GAGTCAATCT CTTTTTCTAT AGCCAAATAT TTTTCGATGC TATTGATGAA AGGATCATCC 12 0 0 

ATAATTTCAA TAGCAAGTAC CACCTGAGCT TCTTCAGCCC AGTCACAGGC TTTTCTCAAA 12 60 

TTTTTGATAA AACGTTGGCG TGTCTGGGGT GACTTTTCCT CATAGTAAAC ATCGTAACCA 13 2 0 

G 1321 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 65 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 



TTTTTCTGTT 


TTTCGGAGCA 


AACTGGGCTC 


CAGCCGGTTT 


TGGCCTTCTT 


TCCTTAGCTA 


60 


CAGCTGGTTT 


AGCTGGCTCA 


GATTTTTCGG 


CTTTCTTTTC 


TGCACTTACT 


TTTGGTGCTG 


120 


CAGGTTTTGC 


TTCTACTTTC 


GGAGCAGCTG 


CAGGCTTAAA 


GCTGGCAGCA 


ATTTTTGCAG 


180 


CGACAGCTTC 


TTCCACACTT 


GATGAGTGGC 


TTTTCACATC 


CAAGCCCAAC 


TCTTTTGCAC 


240 


GCGCTACAAC 


TTCTTTACTT 


TCTTTTCCAA 


GTTCTTTTGC 


GATTTCGTAC 


AATCTTTTCT 


300 


TAGACAAATC 


ATGTCCTCCT 


CTTCTATTCC 


AT AAG AG AC C 


TCATTTTCTT 


TGTAAATCCA 


360 


GCATCTGTTA 


CAGCCAAAAC 


CTTTCTCGAT 


TTCCCGACTG 


CTATGATTAA 


TTCCAGTGTT 


420 


GAAAAC AC GG 


TTACAATTTC 


TACTTGATAA 


TAATGACTTT 


TATCTTGAAT 


CTTCTTGGTC 


480 


AGATTGGGTC 


CAGCATCATG 


AGCTAGAAAG 


ACCAACTTGG 


CCTTGCCGTC 


TTGAATGGCC 


540 


TTGACCACCA 


ATTCTTCACC 


CGATATGATG 


CGCCCTGCTC 


GCTGAGCAAG 


CCCCAAGAGA 


600 


TTACTTATCT 


TTTGCTTATT 


CAAGTCCCAA 


CTCTCTTCTT 


TTCACTTTGT 


GATCCACATA 


660 


AGCGATCAAC 


TCGTCATAAA 


AGCTTTCTTC 


CACTTCCATG 


CTAAAGCTGC 


GGTTAAAGAC 


720 


CTTCTTCTTT 


TTCGCCTCTA 


GGGCTTCTGC 


ATTGTCTAGT 


TTGATATAAG 


CGCCGCGGCC 


780 


ATTGGCCTTG 


CCCGTAGGAT 


CAATAAAGAC 


TTGTCCTTCC 


TTGTTCTTGA 


CAATGCGGAG 


840 


CAAATCACGC 


TTATCAATCA 


CTTCGTTAGA 


CACAACAGAC 


TTGCGCAAAG 


GGATTTTTCT 


900 


TGTTTTCATC 


TTTCCCTCCT 


CTAGCAGCTT 


TTATTCTTCT 


ACAGTATCGT 


TTTCTACTTC 


960 


CAACTCTACT 


GAAGCAGCGT 


CTTCCATGGC 


TTCAAATTCG 


CTAGCAGACT 


TGATATCGAT 


1020 


ACGGTAACCA 


GTCAAGTGAG 


CCGCCAAGCG 


CACGTTTTGT 


CCACGACGAC 


CAATGGCAAG 


1080 


AGAAAGCTTG 


TTATCTGGAA 


CAACCACCAA 


GGCACGTTTG 


CTGTCGTTTT 


CATCAAAGAT 


1140 


AACTTGGTCA 


ACCTCAGCAG 


GAGCGATGGC 


ATTGTAGATA 


AATTCAGCTG 


GATCTGCTAC 


1200 


CCACTCGATA 


ACATCGATAT 


TTTCTTCGAT 


TGGTACCATG 


CGGTCATTTT 


TAGCATCGTA 


1260 


AC GAG 












1265 



(2) INFORMATION FOR SEQ ID NO: 69: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 13 05 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 



AT AAAC C AAA 


GGAAGC TG AG 


CTCTTTAGTC 


CCAGCTTCTT 


TTTATATATA 


AAATTTTACC 


60 


C GTGAAAAGA 


CAGGGCCTTA 


GCAGACTTCT 


TTTTTACTTC 


GTTCACCCTT 


GCTTTTTCTT 


120 


TGTATGTTTG 


GGCGTTGGCA 


GTTGGTTATA 


CATAGCTAAA 


ATCAGGTCTT 


ATAGAAACAT 


180 


CTTATTATCA 


tv m m /"i mm /""i tv 

AGTTCTTCCA 


CTCAAATCAT 


TTCTTTGGCA 


CCTTTGTATG 


G AAAC TC AAA 


240 


AGAAGATTGG 


TCAATCTTAT 


CTAAGACTGC 


TTGCACGGGT 


TTAACTAAAA 


GCGATCGTCA 


300 


m tv tv tv m j*— i r"^ 

TAAATGCCGC 


CAATAATCTT 


GCCGCGGAAG 


TAAAGAATAT 


ACTCCCCCAT 


CATGGAACGG 


360 


m tv tv /-^i m /~i tv Tv m 

TAAGTCACAT 


CATCTAATCC 


TGATAATTGT 


TCCAAAACAA 


ATTCCAAATA 


GTTCTTACTT 


420 


GATGCCATTT 


CTAATCTTCT 


AGGCTCTGTT 


CAACGATAAC 


AACCGTATAG 


AGTTCTTGCT 


480 


TAACCTCGCA 


TCCAATTGAT 


TTAAAGCCCT 


GCTTTTCCCA 


AAAATGCTGA 


GATTGCGGAT 


540 


TTCCCTTAAC 


ATAAGCCAAA 


CGTGCCTTTC 


GAAAGTTCTT 


AGCAAAATAA 


GCTAGTGCTT 


600 


CTGTCACAAT 


ATGACTACCA 


ATCCCTTTCC 


TCTGATAGGC 


TTGATCAACC 


ATAAACAAAC 


660 


CAATAAAAAC 


AGTCTCCTCA 


TCAGGATATG 


CATAGACAAA 


ATCCATAACA 


GCCACAAGGT 


720 


CAAATCCATT 


CCAAAATCCA 


ACAAAAAACT 


TATCAGCCTT 


AGCTTTACCT 


TCAGGTAGAC 


780 


AAAGCATGTC 


CTCTTTTACA 


GTTGCAAAAT 


TTGGCTCTGG 


TGGACAATGC 


TGAAAATACA 


840 


GAGGATTACT 


TTCATATAAA 


GATAAAATAC 


TTGGAATATC 


CTTTTCAGTT 


AGTATCCTAC 


900 


AACTGTAATA 


CTTAGATAGT 


TGGTCAATCA 


TCTTTTCAAA 


TTCGATACTT 


TCTTGTGCCC 


960 


TGTGATTATG 


ACACAGGAAG 


ATGCACTGAT 


CGTCATCAGC 


CACATAAAAG 


TTCTTTCCAT 


1020 


CGTGCCTAAT 


CGTTGTCTCA 


AACCTTTGGA 


TAAAACCTTT 


AGCCTATACA 


ACTGGATTTT 


1080 


CCTCTCTCAA 


AAGTATATTC 


TTTTGCAGGC 


GAACTTCCTC 


AAAATCAGTC 


GTGTGCAACT 


1140 


TCAGTAGAAT 


ATTCATAGGC 


TCGGATAATC 


TGAGCGACAA 


CAGGATGGCG 


AACCACATCC 


1200 


TTGGCTGAAA 


AATGAACAAA 


GTCAATCTGA 


TGGATGTTCT 


TGAGTTTCTC 


TTGAGCATCA 


1260 


ATCAAACCGG 


ACTTGACATT 


ACGTGGCAGG 


TCAATCTGAC 


TAATA 




1305 



(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1742 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
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m tv tv rn m m 

CTAATCTCCT 


m tv tv tv tv /"^i m /^i tv 

TAAAACGTGA 


TCTTTTCAAG 


AATATTTTTA 


TCTAAACAAT 


CCAGCAAGTC 


60 


TTGGTAAGAA 


m tv /T tv m m /~t yi m 

TAGACTTCGT 


AAGTCGGCTG 


GGCTTGTGTG 


TGATTTTCGA 


GGTGATGAGG 


120 


tv mm Tv m tv tv 

AT T ATAC C AG 


AT AG TG TC AA 


TCCCCGCATT 


ATTGCCACCT 


TGAATGTCGG 


CGGTTAGAGA 


18 0 


ATCTCCAATC 


7\ rn/~i tv /-i /^i rrnrim 

A 1 C AGCG 1C1 


TTTCTTTACT 


TV TV TV m/^i/>| TV /"I TV 

AAATCCAGCA 


ATTTGCTGGC 


CAATCTTTTC 


240 


ATAAAAAAGA 


/—i f~i TV m/^iOOO/^m 


TTTGAGTTTG 


CAACTGTTCT 


GAGATAAAGA 


CTTGATTGAA 


3 0 0 


A 1 AAGGTGC I 


AbALLAbAl 1 


/~t 7\ /~1 /~1 TV TV TV 

bAbLbAAALb 


TCCTGTC TGA 


tv m /-*** y"T tv rr> tv tv 

ATGGCAGTAA 


TGCCATTTGT 


3 6 0 


C GC AGC AT AC 


7v 7\ /-i mm 7v m tv 7\ m 
AAbl 1A1AAI 


/"» tv /■*» /— t m /-» tv tv m 

CACGCTCAAT 


GAGGCTGTCC 


AAGAGATCAT 


GAGCGCCCGA 


42 0 


TAGTGTTTGT 


LLLlbL Ibbb 


ppivripmn tv tv tv 

LbAbb TAAAA 


TTGGTAACGC 


m /— • /— i TV TV r— 1 TV TV 

TGGGCAAGAA 


AACTACCGTC 


480 


TTTTTCCTGT 


CCAAAATGAG 


/~1 TV TV TV m TV TV TV 

CAAATAAACG 


AGAAAAGCGC 


GTGTTAAC C A 


GCTCTTGTTT 


540 


ACTGATTTTC 


TTCAGCTCCA 


tv /^m/~immm/^i/~i tv 

AGTCTTTCCA 


GAGAGCCTTG 


TTCATAGGAA 


CGTAATAATC 


600 


mmm j 7v m tv tv /""i/"i/^t 

TTTATAAGCC 


/-» tv tv m tv m/"*i 

GGAATATCCG 


/""i 7v tv m /*■ 1 m m /"-i 

CAACTCCTTC 


TTCTTTTAGA 


AGTGGAGTCA 


AAGCCACATC 


660 


CTCAGCAGCA 


m/**< TV "7\ "7\ TV fTl/~1 TV 7V 

TCAAAATCAA 


TV TV /-~^ TV /*t m^~r m /^t 

GAAGAGTGTG 


GTCGAGGTCG 


AAGAGTACAA 


ATTTGTAGAA 


720 


/-» tv t\ mmm/~i tv 

CAATTTGAGG 


TTTTCCTTTC 


TGAAAATTCA 


TTAAGAACAT 


TATATCATAA 


AGCACCTCAT 


780 


TV i^l "TV TV |-f-l m TV TV y^l fTl 

AC AATT AAC T 


tv tv mmm tv tv m/"* t\ 

AATTTAATCA 


nmmn tv tv iv tv tv tv 

CTTAAAAAAA 


ATTCGAACAC 


TTTCTATACA 


ACTGACAGCT 


840 


CAAATCTTTC 


AGAATAGAAC 


AAT AC T AAC T 


ATCGAACACC 


CCGTCTTCAT 


AAATACATAT 


900 


GTAATTCTAG 


GCCTAGAATT 


C C T AT AAAC T 


AAATGCTTTC 


ATACTCTTCC 


AAGTAATTGA 


960 


TTGCCTTAAA 


TTTTAATTTT 


TGAAGGTTTC 


TAAAGCTAGA 


ATAGCCCCAT 


CACAATCAGT 


1020 


TTTGATTGAT 


m/^r tv /■"*% tv tv mmm tv 

TCACAATTTA 


G AAAC AC TAT 


AGTTTCACTC 


CTGTTAAAAT 


AAAAAGGAAC 


1080 


TGCATAAAGC 


AATCCCTTTC 


TGATTTTGAA 


ATCATTTACT 


TAACATTTTA 


TAGTTGAGAT 


1140 


TV tv m >T TV TV m TV ^**t 

AATCAATAGC 


mm tv rrv m tv m tv tv 

TTATCTATAA 


AAAGAGTTAT 


AGTAAAATTC 


CTTATTTATT 


GATTCCAAGC 


1200 


TCCGCTAACT 


m tv mmm^i tv tv m 

GTATTTGAAT 


AACTGACAGT 


TCTGCACCAG 


CCTGAAAAAG 


AGCAGCTGCA 


1260 


mm tv m t\ /^i /t /t tv 

TTATAGGCAC 


mm^i m tv /"t tv tv m 

CTTCTACAAT 


TGGAACCCTG 


TTGATGATGA 


TACTTTTATC 


ACTGAAATCA 


1320 


GTCACCATTT 


TTAAGTTCAT 


TTTAGCAGAA 


CCTAGGTCAA 


AAAAGGCAAG 


TAAAGTATCT 


13 80 


GCTGGATTTT 


/~1 /~i /«| TV TV TV /'"I TV TV 

CGGAAACAAC 


CCTATCTACT 


TGATCAAAAC 


TCGTTCCAAT 


TCCTCCGCCC 


1440 


TCGGTTCCTC 


CTACATAAGT 


AATCGGAACA 


TCTTTAGCTA 


CTTTACTAAT 


CAGTTCAACA 


1500 


ACACCTTCTG 


CAATGTGTTT 


GGAATGTGAA 


ACGATAACAA 


G AC C AAT AC C 


AATACTTTCC 


1560 


ATCAAACCAC 


TCCAGTTTCT 


AAAATAGCAG 


TAAAGAGTAA 


TCCTGATGAG 


AATGATCCAG 


1620 


GATCAATATG 


TCCAAGAAAC 


CACATGCTCC 


TAAGACAAGA 


GCTAACAGAC 


TGGCCATCAA 


1680 


TAATAGTATT 


GTTCTTTTTT 


TCATCATTAC 


TCCTTAACTA 


GTGTTTAACT 


GATTAATTCG 


1740 



AT 1742 



(2) INFORMATION FOR SEQ ID NO: 71: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
GTGGAATGCG GGGACGCCTT GTCTAATTTT GGATCAAGCC CTGAGTTTGA CACAGGGAAA 60 
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m i TV /'"I /~t m /— 1 /"I TV /"* 

TGAGCTGGAC 


GGACTGCTAT 


CTCTGAAGAA 


ATTACTGGCA 


CCATTAGCCT 


ATCAGCCTTG 


120 


/**> tv m/"i t\ mm tv m/~« 

GATGATTATG 


TGGCGGCCTT 


GTCTCAACAG 


GATGTTCCCA 


AAGCTTTGTC 


TTGCTTGAAT 


180 


CTTCTTTTTG 


a /-i a 7\ mri/^m7i tv 

ACAATGGTAA 


G AGC ATG AC T 


CGTTTTGTGA 


CCGATCTTTT 


GCACTATTTA 


240 


AGAGAC 1 Ibl 


TAATTGTTCA 


TV TV /-i TV in/T/-»/^/-l TV 

AACAGGGGGA 


GAAAATACTC 


ATCATAGTTC 


AGTCTTTGTA 


300 


GAAAATTTGG 


f* tv r*% m m /"» r* m ti 

CACTTCCTCA 


AAAAAATCTG 


TTTGAAATGA 


TTCGCTTAGC 


AACAGTGAAT 


360 


TT AGC AG AT A 


TTAAGTCTAG 


TTTGCAGCCC 


AAGATTTATG 


C TG AAATG AT 


GACCGTCCGT 


420 


1 1GGCGGAAA 


I CAAGCCCGA 


ACCAGCTCTA 


TCAGGAGCGG 


TTGAAAATCG 


AATTGCTACG 


480 


CTGAGACAGG 


AAGTTGCCCG 


TCTCAAACAA 


GAGCTTTCTA 


ATGCAGGTGC 


GGTTCCTAAA 


540 


LAAGTTGCAC 


C AGC T C C TAG 


TCGACCAGCT 


ACGGGCAAAA 


CAGTCTATCG 


TGTCGATCGC 


600 


AA I AAAGTGC 


AATCTATCTT 


TV /~1 TV TV TV /"I /^i y^i 

ACAAGAGGCC 


GTCGAAAATC 


CTGATTTAGC 


ACGTCAAAAT 


660 


CTAATTCGTT 


m/~i / — i tv /""i t\ tv m/~i /~i 

TGCAGAATGC 


C TGGGG AG AG 


GTAATTGAAA 


GTCTAGGTGG 


GCCGGACAAG 


720 


GCTCTGCTAG 


TTGGTTCTCA 


ACCGGTTGCT 


GCCAATGAAC 


ACCATGCTAT 


TCTTGCTTTT 


780 


GAGTCTAACT 


TC AATGC TGG 


TCAAACTATG 


AAAC GAG AC A 


ATCTCAATAC 


CATGTTTGGT 


840 


AATATCCTCA 


GTCAGGCGGC 


AGGTTTTTCA 


CCTGAGATTT 


TAGCTATTTC 


CATGGAGGAA 


900 


TGGAAAGAAG 


TTCGCGCAGC 


CTTTTCAGCC 


AAAGC C AAAT 


CTTCTCAAAC 


TGAAAAAGAA 


960 


GTAGAAGAAA 


GCCTGATTCC 


AGAAGGATTT 


GAATTTTTGG 


CTGATAAAGT 


GAAGGTAGAG 


1020 


G AAG AC T AAA 


GAAAGATTTC 


ATGATACAAT 


AAGTTTATGA 


ATAAACAACA 


ATTTATTATT 


1080 


ATGGCGCTAT 


TTACAGCTGC 


TG AG AC C TAT 


TTTTTCAATG 


AAGCCTGGAT 


GACTGG 


1136 



(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1670 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 



CTGTCTCTGA 


AACAGTCACA 


TCAAGTGCCT 


CTGAACAANC 


GCCCCNCCTA 


GGTNGACGGT 


60 


ATCGATAAGC 


TCGATCTGTG 


ATTTCAGAGA 


AGAAATCAAG 


TGCTGTAACA 


GAAGTAAGAT 


120 


GTAATTGTAT 


GTAAAGGAGA 


CGTCATGTTA 


AATAGTATTG 


T AAC C ATT AT 


TTGTATTGCC 


180 


CTTATCGCGT 


TTATCTTGTT 


TTGGTTTTTC 


AAAAAGCCTG 


AAAAATCTGG 


AC AAAAAGC C 


240 


CAGCAAAAAA 


ACGGATACCA 


AGAGATTCGA 


GTGGAAGTCA 


TGGGAGGCTA 


TACTCCTGAG 


300 


TTGATTGTCC 


TCAAGAAATC 


AGTGCCAGCC 


CGCATTGTCT 


TTGACCGCAA 


GGATCCTTCA 


360 


CCATGTCTGG 


ATCAAATTGT 


TTTTCCAGAT 


TTTGGTGTAC 


ATGCGAACCT 


GCCAATGGGG 


420 


GAAGAGTATG 


TAGTGGAAAT 


CACGCCTGAA 


CAGGCTGGAG 


AGTTTGGCTT 


TGCTTGTGGT 


480 


ATGAACATGA 


TGCACGGCAA 


GATGATTGTA 


GAGTAGGTGG 


AGACTATGAC 


AGAAATTGTG 


540 


AAAGCAAGCT 


TAGAAAATGG 


CATTCAAAAA 


ATCCGTATCC 


GAGCTGAAAA 


AGGCTATCAT 


600 


CCAGCCCATA 


TCCAGCTTCA 


AAAGGGAATT 


CCAGCTGAGA 


TTACCTTTCA 


TTCGTGCTAC 


660 


TCCTTCAAAC 


TGTTATAAGG 


GAAATTCTGT 


TTGAAGAAGA 


AGGTATCTTG 


GAAGCAATCG 


720 


GCGTAGATGA 


GGAGAAAGTC 


ATTCGTTTTA 


CACCTCAAGA 


ATTAGGGAGA 


CATGAATTTT 


780 
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C rTGTGGCAT 


GAAGATGCAA 


AAGGGAAGCT 


ATATAGTCGT 


TGAGAAGACT 


CGAAAATCTC 


840 


TATCTCTCCT 


/~» f~\ t\ 7V Tv o/rpmm 

bLAAALbTTT 


TTGGATTACT 


AGTATCTTTA 


CTGTGCCTCT 


TGTGATTCTC 


900 


AibrAl IbtbrbrA 


loll bjVjL. Abb 


rp a /~'/'~» a mm a /~<m 

TAG CAT TAG T 


CATCAAGTCA 


TGCATTGGGG 


AACCTTTTTA 


960 


vjL. aal ml CjiC 


CTATTATGTT 


AGTTGCGGGT 


AAGCCATATA 


TCCAGAGTGC 


TTGGGCCAGT 


1020 


mmm A A A A A r"" 1 r" 1 


A t~* A A rnppr 1 A A 

ALAA 1 bjCCAA 


CATGGATACC 


TTGGTTGCGC 


TGGGAACTCT 


AGTGGC TTAT 


1080 




rp a r -1 rp rp rp /~i rp 
1 Abi 1 1 brC 1 C 1 


CTTTGCTGGT 


CTCCCTGTTT 


ACTTCGAAAG 


TGCTGGATTT 


1140 


AiLLltl 1 1 1 


rp fi rp rp fi rp rp rp m 
1 Cbr 1 1 C 1 1 1 1 


bbbAbbAbl X 


TTTGAGGAAA 


AAATGAGGAA 


AAATACGTCC 


1200 


LAAbL 1 b? 1 brbj 


A A A A mm a r~»rp 

AbAAA 1 1 AC 1 


bb AC T TGC AA 


GCTAAAACCG 


CAGAAGTCTT 


GAGTGATGAT 


1260 


Abi 1 iAlbiLL 


AAbj i Ibbl 1 1 


bbAACAAGTC 


AAGGTACGCG 


ACCTTGATTC 


CAGTGCGTCC 


1320 


nppTr 1 a a a a 

Lbb 1 bAAAAb 


A rp rp/^ 1 /^rpz'" 1 rprn^' 
All bib. 1 bj 1 1 G 


a mo/^m/^m/^/^m 
A 1 GGTGTCGT 


AGTAGAAGGT 


GTCTCTAGTA 


TTGACGAATC 


1380 


CATGGTGACA 


GGTGAGAGTC 


TGCCTGTGGA 


CAAGACAGTT 


GGAGATACTG 


TCATTGGCTC 


1440 


AACCATCAAT 


CATAGTGGAA 


CGCTTGTCTT 


TAGAGCAGAA 


AAAGTTGGCT 


CAGAGACTGT 


1500 


TTTGGCTCAG 


ATTGTAGATT 


TTGTGAAGAA 


AGCTCAGACA 


AGTCGTGCGC 


CGATTCAGGA 


1560 


CTTGACGGAT 


AAGATTTCAG 


GGATTTTTGT 


CCCAGTAGTT 


GTCATTTTAG 


GAATCATGAC 


1620 


CTTTTGGGTT 


TGGTTCGTCT 


TGCTCAGGGA 


TAGTGTGGTC 


GTGCTTGGAG 




1670 



(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1252 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 



ACAAGAACAA 


TTGGAACAGG 


TACAGGCTGT 


TAAAAAATCG 


ATTAACACAG 


CTAGTGAAGA 


60 


AGTGAAAAAC 


CAAGTCTTGC 


TACCCATGGC 


TGATCACTTA 


GTGGCTGCTA 


CTGAGGAAAT 


120 


TTTAGCGGCT 


AATGCCCTCG 


ATATGGCAGC 


GGCTAAGGGG 


AAAATCTCAG 


ATGTGATGTT 


180 


GGATCGTCTT 


TATTTGGATG 


CAGATCGTAT 


AGAAGCGATG 


GCAAGAGGAA 


TTCGTGAAGT 


240 


GGTTGCCTTA 


CCAGATCCAA 


TCGGTGAAGT 


TTTAGAAACA 


AGTCAGCTTG 


AAAATGGTTT 


300 


GGTTATCACA 


AAAAAACGTG 


TAG C T ATGGG 


GGTCATCGGT 


ATTATCTATG 


AAAGCCGTCC 


360 


AAATGTGACG 


TCTGATGCGG 


CTGCTTTGAC 


TCTTAAGAGT 


GGAAATGCGG 


TTGTTCTTCG 


420 


TAGTGGTAAG 


GATGCCTATC 


AAACAACCCA 


TGCCATTGTC 


ACAGCCTTGA 


AGAAGGGCTT 


480 


GGAGACGACT 


ACTATTCATC 


CAAATGTGAT 


TCAACTGGTG 


GAGGATACTA 


GCCGTGAAAG 


540 


TAGTTATGCT 


ATGATGAAGG 


CCAAGGGCTA 


TCTAGACCTT 


CTCATTCCTC 


GTGGAGGAGC 


600 


TGGCTTGATT 


AATGCAGTAG 


TTGAGAATGC 


CATTGTGCCT 


GTTATCGAGA 


CAGGAACTGG 


660 


GATTGTCCAT 


GTTTATGTCG 


ATAAGGACGC 


AGATG AC G AC 


AAGGCACTGT 


CTATCATCAA 


720 


CAATGCCAAA 


ACCAGTCGTC 


CTTCTGTCTG 


CAATGCCATG 


GAGGTTCTGC 


TGGTTCATGA 


780 


AGACAAGGCA 


GCAAGCTTCC 


TTCCTCGCTT 


GGAGCAAGTG 


CTGGTTGCAG 


ATCGAAAAGA 


840 


AGCTGGGTTG 


GAACCAATTC 


AATTCCGCCT 


AGATAGCAAA 


GCAAGCCAGT 


TTGTTTCAGG 


900 


TCAAGCTGCT 


CAAGCACAAG 


ACTTTGATAC 


CGAGTTTTTA 


GACTATATTC 


TAGCTGTTAA 


960 
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GGTTGTGAGC AGTTTAGAAG AAGCGGTTGC GCATATTGAA TCCACAGTAC CCATCATTCG 102 0 

GATGCTATTG TGACGGAAAA TGCTGAAGCT GCAGCATACT TTACAGATCA AGTGGACTCT 10 80 

GCAGCGGTGT ATGTTAATGC CTCAACTCGT TTCACAGATG GAGGACAATT TGGTCTTGGT 1140 

TGTGAAATGG GGATTTCTAC TCAGAAATTG CACGCGCGTG GTCCAATGGG CTTGAAAGAG 12 00 

TTGACCAGCT ACAAGTATGT GGTTGCTGGT GATGGGCAGA TAAGGGAGTA AG 1252 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 85 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

CTGCCCTAGC AGGAACGCAA GAAGGAACTG GAGAATAGGC ATTTTCAAAA TTATAACCTA 60 

CACTAGCCAT CATATCTAAT GTTGGAGTGC TAACTAGCTT ATCCTTACTA TTCAAGGATA 12 0 

AGGCGTCTGC TCTCATTTGA TCTACAACAA TCAAAATAAT ATTTGGTTGT TTTGTCTGAA 180 

CCATAAAATC TCCTTTCTAA TATGGCAAAA GAGGCACAAG AAGATATCTA CCTTTACTGC 2 40 

ACCCCTTTCT ATATCAATCT C TC T AT AT AA AGCAATAACA TTCTTGTTAT GTTTTATAGA 3 00 

ACAATGGACT AAAATATGAC TAAATCGATT AGGAAATTCA AATCATTTTC TAGTACTGTT 3 60 

TTAGTAAGTT AC AGTGT AC T ATTCCAACTT CAATAAATTA TAAACCTTTG TCTAATAACA 42 0 

ATTTTAGTGG AGATAAGAAA TCCTACACCT AACTCATCTT ACACGTAATC TATTTCTATT 480 

TTATCACAAA AAACGCAAGT AAGACCATTA ACTCAATTCA GTTTTATCTG CCATTTTCAC 540 

AAATGGGAAA TAAGTCAAGA CACTAATAAT CAAACAAACA ACTGATAAGA TGATGGCACG 60 0 

CCAATCAAAT GCTGTAGAGA AGAAACCATA TAAAATTGGA GGCATTACCC AAGTAACATT 6 60 

TTGTGTAACA GGTGAAACAA GACCCCAGCT TGTTGCCCAG TAAGCTACCG TTGCCATGAA 72 0 

AACCGGGCTA AGTACAAATG GTATAAATAG CAAAGGATTC AAGACAACTG GTAAACCATA 780 

ATTCGATACC GGCTCACCAA TATTAAACAG AACTGGTGCT AGACCAAGTT TAGCAACTTT 840 

TCGATAATGA CTGTTTCTTG AAAAAATTAA AATAGCAAGT ACTAATCCTA ATCCTCCAAA 90 0 

CCAGACAAAC GCCCCAAAAG ACCCACTTGT C CAT AT AT AA GGAATCGGTT CACCTTTTTG 9 60 

GAAAGCATCC AG ATTCGC T A ACATAGCAAC TCCAAATAGC CCTTCCATGA TGGGAGCCAA 102 0 

TACATTTCCT CCATGGAGAC CAAAAAACCA GAATAACTTA TTCAAAAAGA TCATCAGAAT 10 8 0 

AACTGCAAAG AAAC TTTG AG ACAAACCTAG TAATGGCGTT TGTAACACCT TGTAAACCCA 1140 

ATCAATCAAT AAGTCATTGC TAAGTAAATG GAAAACATAA GTCAAGATGG C T AC TAT AT A 12 0 0 

CATCGCCATA AATCCTGGAA TGATAGAAGT GAACGGCTTA GCAATCGCAG GGGGAACTGA 12 60 

ATCTGGTAAC TTGATTACCC AGTTCTTTTT CATTACTTTA CAGAAAATAA TAGAGGCTAA 132 0 

AAATCCAATC ATCATGGCTG TAAAGTAGCC TCTGGCATTA ATATGGTTTC CTGGAATCAC 13 8 0 

ATTCCCAATA GTTACCATCA GATTTTTACC ATCAAATGCT AGATTATCAA TTCCATGTTA 1440 

AGATTTGATC TAATTTCACA TCTCCTACAT TTGCCAAAGG GAAACTCTTT GTAACTGTAC 15 00 

TTCCAATCGA AATGACAAAC GAAGCAAGTG ATACCAAACC AGCAGAAACT GTATCAACCT 1560 
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TGTAAATCTT AGCGATATTC ACTCCCAAGC AATAGATGAA CAACAAGGAA ACAATTGGTA 162 0 

TACTTCCCTT GAAT AC C AAA TTATTGATGT CAACAAGCCA CTGAAAGGTT TTCGTAATAC 16 80 

TTCCTAGGTG AAATTGTTGT GGTAAATCCA CTAGAAAAGC ATTTAATAAC AAAGCAATGG 174 0 

AACCTGTCAT AATAACAGGC ATAGTCCCCA CAAATGAATC ACGTT 17 85 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 8 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

ATCGAATTTC ATTTCTATTT CCTATTCCAT TTTTATTCAA AAAATCAAAA AGC AAAC TAG 6 0 

AAAGCTGGTC GCTGGTGGTT CAAAACACTG TTTTGAGATT GTCAATAGAA C TG AC AAAC C 12 0 

CTGTAATATA CCTGCATATA TACATACGAC AAGGCGATAC TACCCTAGTT TGAAGAGATT 180 

TTCGAAGAGT ATTCATTTTT GTCTTTTACT TATTATACCA TATTCACATA AAAAAACGAA 240 

CATTCTTATC CTAAAAAATG CTCATTTTTC TTAAATTATC AATCTAAATC TGGTTTATAG 3 00 

AAGGAACGAT TATCCATAGC GAAGATTTTA TTGGTCATCT CTCCTTTATC CACCAAAGCC 3 60 

AGAGCTGTTG ACATCATCAT CATGCTTGCA TCCAGATTGT CAATCATATG GATAATCTCT 42 0 

GCCTCCATAA TACGTGGACG GACTGGAATT TCCATATTCA AGCAAGCCGT GGTGGACTTG 480 

AGGATGACAT GACGAAGCAA AACGACTTCT TCCTTGGTAT CATCGATGCC GAGTTCCATA 540 

ACTGTCTTGG TAATTTCGCT ATCAATGAGA GCGATATGTC CAAGAAGATT ACCTCGCACT 6 00 

GTGTACTCTG TCTGGTCTGG CCCCGTCAAC TCGATAACCT TAGCTAAGTC ATGCAGCATA 660 

ATCCCCGCAT AGAGCAGGCT CTTATTGAGC TGAGGATAAA CTTCGCTAAT AGCGTCTGCC 72 0 

AAACGTACCA TGGTCGCCGT ATGATAAGCC AACCCCGTTT CAAAGGCATG GTGGTTGGTC 780 

TTGGCGGCTG GATAGGAGTA GAATTCCTTA TCATACTTGG TGTAGAGATT TCGGACAATC 840 

CGTTGCCAGA CAGGATTTTC AATTTTGAAA ATCATTTGCG ACATGTAGTC ACGAATTTCC 9 00 

TTGACATCAA CTGGTGACTT GACCTTGAAA TCAGCTGGGT CATTGGGTTC ACCAGCTTGA 9 60 

GGCAGGCGGA GAGTAATTTG ATTGACTTGA GGGGTATTGT TATAAACTTC TCGGCGTCCT 102 0 

TTCATGTGGA CAACCTTACC TGCGGTAAAG GCCTCAATGT TATGAGGTTG GGCATCCCAG 108 0 

AGCTTCCCAT CAATCTCGCC ACTATCATCT TGGAAGGTAA AGGCTAGGTA GTTTTTCCCA 1140 

GCTCGAGTTT GCCTCAGGTC AGCTGATTTG ATTAGGTAAA AGCCTTCAAA TAACTCATCT 12 00 

TTTTTCATGT GACTAATCTT CATATTCTTC CTCATTTTCT TGAAAATGGA GTAGATCAAG 12 6 0 

CGCAGGCTCA CCTTCTGACA ACTCAATGTG ACGGAGCGTC CGCTCGATAG CTATGGTACG 13 2 0 

ACGGTTTAAT AATTCGATCA ATATTGCCAG AGGCATGTTG GAGATGTTTT TGTGCCTTGA 13 8 0 

CCAGAA 13 86 

(2) INFORMATION FOR SEQ ID NO: 76: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1167 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 



CTCAGATTAC 


AGAGGACAAT 


CAACTGGTTC 


ATTTTCGTTT 


CCAGTTTCAA 


AAAGGCTTAG 


60 


AAAGGGAGTT 


CATCTATCGT 


GTGGAAAAAG 


AAAAAAGTTA 


AGGCAGGTGT 


TCTCCTCTAC 


120 


GCAGTCACCA 


TAGCAGCCAT 


CTTTAGTCTT 


TTGTTGCAAT 


TTTATTTGAA 


CCGACAAGTC 


180 


GCCCACTATC 


AAGACTATGC 


TTTGAATAAA 


GAAAAATTGG 


TTGCTTTTGC 


TATGGCTAAA 


240 


C G AAC C AAAG 


ATAAGGTTGA 


GCAAGAAAGT 


GGGGAACAGG 


TTTTTAATCT 


AGGTCAGGTA 


300 


AGCTATCAAA 


ACAAGAAAAC 


TGGCTTAGTG 


ACGAGGGTTC 


GTACGGATAA 


GAGCCAATAT 


360 


GAGTTTCTGT 


TTCCTTCAGT 


CAAAATCAAA 


GAAGAGAAAA 


GAGATAAAAA 


GGAAGAGGTA 


420 


GCGACCGATT 


CAAGCGAAAA 


AGTGGAGAAG 


AAAAAATCAG 


AAGAGAAGCC 


TGAAAAGAAA 


480 


GAGAATTCCT 


AGTC AATTC A 


AC T AT AATG C 


GTTGAATCCA 


GAATAGTCCA 


CTGTAGTTTC 


540 


TAGAAAATTG 


CTGGAAATGG 


ATGTTAAGCT 


CCAATTCATT 


TGTTTATATC 


TTATTTCAGT 


600 


CCACTATACT 


TTGTGCTAAA 


TTAAAGATAT 


GAAACATGAT 


TTTAACCACA 


AAGCAGAAAC 


660 


TTTCGATTTC 


CCTAAAAATA 


TCTTCCTCGC 


AAACTTGGTA 


TGTCAAGCAG 


CCGAGAAACA 


720 


GATTGATCTT 


CTATCAGACA 


AAGAAATTTT 


AGATTTCGGT 


GGTGGCACGG 


GTCTATTAGC 


780 


CTTGCCCCTA 


ACCCCTAGCC 


AAGCAGGCTA 


AGTCAGTCAC 


TCTTGTAGAC 


ATTTCTGAGA 


840 


AAATGTTGGA 


GCAAGCTCGT 


TTGAAAGTGG 


AGCAGCAAGC 


AATCAAGAAT 


ATCCAGTTTT 


900 


TGGAGCAAGA 


TTTACCGAAA 


AATCCCTTGG 


AGAAAGAGTT 


TGATTGCCTT 


GCTGTTAGTC 


960 


GGGTTCTTCA 


TCATATGCCT 


GATTTGGATG 


CGGCTCTCTC 


ACTGTTTCAT 


CAACATTTGA 


1020 


AGGAAGATGG 


GAAACTCATC 


ATTGCTGATT 


TTACCAAGAC 


AGAAGCTAAT 


CATCATGGAT 


1080 


TTGATTTAGC 


TGAACTGGAA 


AACAAGCTAA 


TTGAGCATGG 


GTTTTTCATC 


TGTGCATAGT 


1140 


CAGATNCTCT 


ATAGCGCTGA 


AGANCTG 
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(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 916 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 

TCTCCCAACA TATAATTTCC GTTTTCCAAT CCCCCAGCTG TCATACAGTC TGTGATAAGA 6 0 

GCGATGTTTT CTGTTCCTTT TTGTTTGATA AGAATTTCGC AAGCCTTTGG ATCTACGTGG 12 0 
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1 GACCATLAL 


AGATLAACTC 


TGCATAGGTA 


TGTGGCAATT 


GGTACATGGC 


TCCAACCATA 


180 


LLLAA1 ILAL 


LjLj 1 LtALt 1 LAA 


CCCACGCATT 


CCATTGTAGG 


CATGCACCCA 


AACACTCGCT 


240 


PPAPPA r Pff"' A 

LLAbLAi HjA 


L. 1 bjb, 111111 


GGCTTCATCA 


AAAGTCGCGT 


TTGAATGTCC 


AAG AG C AAC C 


300 




LLjv-LLLj 1AAL 


1 bl AbbAACA 


AAGTCTTCCA 


CCCCATCACG 


TTCTGGTGCA 


360 


AiUbAAl 111 


a rnrn a aPP7\ a r~< 
Al 1 AAbrb, AAbi 


L.CATTTGCCG 


CTTTTTGCCA 


AGAATGAAAC 


TCCTCAACAC 


420 


v.. L bibrbi it J.L1 


A T 1 A T A A prnm 
A 1 A 1 AAb 1 1 


LtL-tAI 1 TTGTG 


CCCCCTTAAA 


AGTTTCTGTG 


AAATATGGAC 


480 


/-trprpz-i a rn a a rp a 
til b. A 1 AA 1 A 


AA1 LLL.AL.LrA 


AIL. T\L AGC AC 


CTGTTGCTTC 


TTTATAATGG 


TTTCCAAGAT 


540 


111 L Ab? 1 bAt 


1 bjb.AAb7L.AA 1 


PG C T C AT AAG 


TGGCTGTTAA 


AGTTGTGGGT 


AAGAAACTGG 


600 


1 AAL AL.C_.bjO 1 


a om a a 0 a a r~* m 
AL 1 AAbiAAGT 


CCTTCACTCA 


TAGTATGCAA 


TGTACCTTCA 


ATGTTGTTGT 


660 


CCATCACATC 


TACACCTGCA 


TATCCATGAA 


TATGAGTATC 


CACAAGACCT 


GGGGCAATGC 


720 


TATAACCTGT 


ATAGTCAATC 


ACCTCAGCCC 


CTTCAGGAAT 


CTGCTCTACA 


TGTTTCCCAA 


780 


ACTTGCCGTC 


CACAAGTTCC 


AAGTAACCAC 


CTCGACAAAT 


CCGTGTGGGT 


AGAAAAACTG 


840 


ATCCGCTTTA 


ATATAGTTAG 


GCATAATGTT 


AACCTCCTTA 


AAAGATTGAT 


TCTACAATTT 


900 


ATTATGTCAA 


TTCGAT 
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(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 786 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 



CTGGATTAAA 


ACGAGGCAGT 


TTCAGACTAA 


TATCCAAGTC 


GTAAGAAATG 


CCTGAAATAA 


60 


GCTTTTCTAA 


ATTGTC C AAA 


GCTTGCGGGA 


AAACGCTCTT 


GGAATAGTTT 


CTCTAAAGAA 


120 


CTTGCTGATA 


TAAAGACATC 


TTGTCTCGAA 


CGCAAGGGAA 


CTTCTCTGAG 


CGGTAGATTT 


180 


TCTTTAATCG 


CTGTTAAAAC 


TTGAAGAACT 


TCTCTATCCC 


TGCTTTCAAA 


AGCGTTGACC 


240 


C G AT AAAG AG 


GTAAGATAGG 


ATGATGAAAT 


TCGCTTGCTA 


GTGTTTCTGG 


ATAAACCCCT 


300 


ATATAGTAAT 


CACAGCCTAG 


TTCTAACGAC 


TCAACTCTAT 


CAAAATAAGG 


CACAATGACC 


360 


GCGATATCCT 


CCAGGTACTG 


GGACAGGACT 


GACCAAGTTT 


TCTCCCCCTG 


CATCTTGGCT 


420 


GTCGAAAGCT 


TCATCAACTG 


CTGATAGCCC 


ACACTAGATA 


GAG C TAAAAA 


GCGCAAATTC 


480 


ACTTCCTGAT 


CATC TAC AAA 


CACTGTCATT 


TCAAGCCCTA 


GCAAAGGATG 


AATGCCGTAT 


540 


TTTTTTGTAA 


TCTCTAGAAA 


GTCGAAAGCG 


C CAT AAAG AT 


TGTCAATATC 


CATCATAGCC 


600 


AAATGAGTGT 


AGCCGTATTC 


TTTAGCTGCT 


C TC AC AT AC T 


TTTCGATCGA 


AATGACGCTT 


660 


TCCATAAAAC 


TATAGACTGT 


TTTTGTATCT 


AGTTGTGCGA 


TCAATTTACA 


CTTCTCCTCT 


720 


ATCCTTCTCA 


C T ATAT TATA 


C C ATTTTC AC 


CTATAAATGG 


CTTCTCTTGA 


GAAAAATTTC 


780 


GATCAG 
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(2) INFORMATION FOR SEQ ID NO: 79: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1213 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 



CACTTTCAGC 


TTCTTCTCTT 


TTTGAACGGT 


T AT AAAC AC G 


AATCAGATTC 


CCTATTTCTT 


60 


GCGATTTATG 


TGATTCCTTA 


TTTTCCAATC 


TAAAGTATAG 


TGAAATGAAA 


TAAAACATGC 


12 0 


GCAAATCGAT 


TAAGGAATTT 


AATCTAATTT 


CTAACAATGT 


CTTAGAAATC 


AAAGTGTACT 


180 


ATTTTAACTT 


CAATGCACTA 


AACATCTAAT 


ACTCAATAAA 


AATCAAAGAG 


CAAACTAGGA 


240 


AACTAGCCGC 


AGGTGGCTCA 


AAACACTGTT 


TTGAGGTTGT 


AGATGAAACT 


GACGAAGTCA 


300 


GT AAC CAT AC 


ATACGGCAAG 


GCGACGCTGA 


CGTGGTTTGA 


AGAGATTTTC 


GAAGAGTAGC 


360 


AAAATGGAAA 


AAGGAGTGAG 


TGAAGCACAT 


CGCCTCCCCA 


CTCCTTTTTC 


TGTTTTTAGG 


420 


CTGTTTTTTC 


AACCTTCAAG 


ATTTTTACAT 


CATAGCTACC 


AACAGGCGTT 


TCAATGGTTG 


480 


CTGTATCACC 


TGTTTTCTTG 


CCAATCAAGG 


CCTGCCCAAT 


TGGGCTTTCA 


TTTGAAACCT 


540 


TACCTGCAAA 


GGCATCCGCA 


CCAGCTGAAC 


CTACGATAAT 


ATAAACTTCT 


TCTTCGTCCT 


600 


C AC C AATTTC 


TTGGATGGTG 


ACTGTTTTAC 


CAATCGCTAC 


TTCGTCCTGG 


GCAACTGCGT 


660 


CGCTATTGAC 


GATTTCAGCA 


TAGCGGATTT 


TTGTTTCTAA 


GCTAGAGATT 


TGTCCTTCGA 


720 


CAAAGGCTTG 


TTCATCCTTA 


GCTGCTTCGT 


ACTCACTGTT 


TTCTGAAAGG 


TCACCGTATG 


780 


AACGGGCAAT 


CTTAATGCGT 


TCTACCACTT 


CTGGTCGACG 


AAACCAATTT 


CAATTCTTCT 


840 


AATTCTTTTT 


CAAGTTTTTC 


CTTTTCCTCA 


AGGGTCATAG 


GATATGTTTT 


TTCTGCCATT 


900 


TTTCTCAACT 


TTCTTCTGAT 


AATATTTTCT 


AAAGAAAATT 


ATGTGAAGTA 


TCACATAATT 


960 


TTAGTTTGTT 


TAGTTTAATT 


TGCTGTTGAC 


ATGTTCAGCG 


ACATTGCGGT 


CGTGGTCTTC 


1020 


TTGATTGTTA 


GCATAGTAAA 


CCTTGCCTTC 


TGTGACATCT 


GCTACAAAGT 


AAAAGTTATC 


1080 


GCTCTTAGTT 


TGATTGATGC 


TTGACTCAAT 


CCGCATCCAA 


GACTTGGACT 


ATCGACTGGA 


1140 


CCAGGCATGA 


G AC C T AC ATT 


TTTATAAACA 


TTATAAGGTG 


AATCAATGTT 


GGTATCAATC 


1200 


GCAACATCCT 


CAG 
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(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1173 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
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TGPGGPTGAG 


x x 1 




TAAGCGTGTA 


TCGGTGACAC 


CTATTTCTCT 


60 


GATTGGGGCA 


GPGAPAPATP 




m /^i nn m /"-"i m v-i 

L. 1 GGTTCTGG 


CAAAAGCGCT 


TG AT AAG G C T 


120 


GCGAAAGAGA 


TTGGTGTPPA 

X X \JVJt X \J X \JVJ/Tk 


V— X X X A X X VJVJ 1 


LtQj 1 CTTTCTG 


CCTTAGAACA 


AAAAGGTTAT 


180 


CAAAAGGGAG 


ATGAGATTPT 

" x w^ivX^I X X V — X 


P A TP A A TTPP 


Ail LL 1 CGCG 


CTTTGACTGA 


GACGGATAAG 


240 


GTCTGCTCGT 


C AGT C AAT A T 




AACj 1 l 1 GG 1 A 


TTAATATGAC 


GGCTGTGGCA 


300 


GATATGGGAC 


GAATTTATP A 

w.rxxx xxx n x \— xi 


A PGA A APPPr 1 
nuvariflrt L. u Lj L 


AAA 1 L 1 1 1 L. A 


GATATGGGAG 


CGGCCAAGTT 


3 60 


GGTTGTATTC 


GCTAATGCTG 


TTPAPPAP A A 


rripr»j> mmm 71 mp 

IllAI 1 lAlb 


GCGGGTGCCT 


TTCATGGTGT 


420 


TGGGGAAGCA 


GATGTT ATP A 

x vj x x xx x \— 


TP A ATPTPPr* 


ALtI 1 1CTGGT 


CCTGGTGTGG 


TGAAACGTGC 


480 


TTTGGAAAAA 


GTTCGTGGAP 


AG A PPTTTr: A 


1 Lt 1 1 ALt 1 AAC 


CCGAAAACCA 


GTTAAGAAAA 


540 


CTGCCTTTTA 


AAATPAPTPP 


PTATPPPr^Tr 1 


L.AA1 1GGTTT 


GGTCAAATGC 


CCAGTGAGAG 


600 


ACTGGGTGTG 


GAGTTTGGTA 


TTPTPP APTT 
X X Li X VjLrr\L X X 


LtALt 111 GGCA 


CCAACCCCTG 


CGGTTGGAGA 


660 


CTCTGTGGCA 


CGTGTPPTTP 


APP A A A TTPP 


LtL 1 AGAAACA 


GTTGGCACGC 


ATGGAACGAC 


720 


AGCTGCCTTG 


GPPPTPTTPA 




1 AAAAAGGGT 


GGAGTGATGG 


CCTGTAACCA 


780 


GGTCGGTGGT 


PTATPTPPTP 


LL1 X lAlLLL 


1 GTTTCTGAG 


GATGAAGGAA 


TGATTGCTGC 


840 


AGTPP A A A AT 


PPPTPTfTT A 


7\ mm rp A A A A A 
All IALtAAAA 


ACTAGAAGCT 


ATGACGGCTA 


TCTGTTCTTG 


900 


1 loLaAl IvjVjA 


IAIQjAI 1GCC 


AT C C C AG AAG 


ATACGCCTGC 


TGAAACTATT 


GCGGCTATGA 


960 


TTGCGGATGA 


AGCAGCAATC 


GGTGTTATCA 


ACATGAAAAC 


AACAGCTGTT 


CGTATCATTC 


1020 


CCAAAGGAAG 


AGAAGGCGAT 


ATGATTGAGT 


TTGGTGGTCT 


ATTAGGAACT 


GCACCCGTTA 


1080 


TGAAGGTTAA 


TGGGGCTTCG 


TCTGTCGACT 


TCATCTCTCG 


CGGTGGACAA 


ATCCCAGCAC 


1140 


CAATTCATAG 


TTTTAAAAAT 


TAAGAAAATA 


GGA 
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(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1209 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO: 


81: 






TCGGAATCTG 


AGCTAGTGTA 


GCTTCCTTAA 


TCTTATCTGA 


TAAGATAGCT 


GTCATATCAG 


60 


ACTCAATCAT 


TTCCTGGAGC 


AATCAACATT 


GACTCGTATA 


TTCCGACTAG 


CGACCTCGCG 


120 


TG C C AC AG AC 


TTGGTAAAGC 


CAATCAAGCC 


AGCCTTAGAA 


GCAGCATAGT 


TAGCTTGACC 


180 


AATATTCCCC 


ATCAAACCAA 


CAACACTAGA 


CATATTAATG 


ATAGCACCTT 


CTCTGGCTTT 


240 


CATCATCGGT 


TTCAAGACTG 


ATTGTGTCAT 


ATTAAAGGCA 


CCAGTCAGAT 


TGACCTTGAG 


300 


CACTTTTTCA 


AAATCTGCTT 


CTGTCATCTT 


GAGCATAAGA 


GTATCTTGGG 


TAATCCCTGC 


360 


ATTGTTGACC 


AAAACATCTA 


CTGAACCCAG 


TTCTGCAATA 


GCTTGATCAA 


TCATACGCTT 


420 


AGCGTCTGCA 


AAATCTGATA 


CATCTCCTGA 


AATGGGAACC 


ACCTTGATAC 


CATAGTTTGA 


480 


AAACTCAGCG 


AGCAATTCTT 


CTGAGATTGC 


CCCACGACTG 


TTTAAGACAA 


TGTTGGCTCC 


540 


TGCTTGAGCA 


AACTTGTGGG 


CGATGGCAAG 


ACCAATTCCA 


CGACTCGAAC 


CTGTAATAAA 


600 


GATATTTTTA 


TGTTCTAGTT 


TCATTTTTTT 


CCTTTCAAAA 


CTTCTACTTA 


TTTTAGTCTA 


660 



191 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



rTifn(Tifnmprp7i A 7\ 
1 I 1 1 1L lAAA 


ALj 1 GC 1 ACTA 


AACTCGCTTG 


ATCTTCCACA 


TGAGCTAAGT 


GAGCAGTTTG 


720 


AILAAI 1111 


ITim A A A A A "A /~i 

1 1AACAAAAC 


C TG AC AAG AC 


TTTCCCCGGT 


CCAATCTCGA 


ATAAAGTTGC 


780 


mm 7\ mr~ , fT ,r r(T^ 


T""P fTp rp a rp/^ 1 

1 1LIJ. CjL.A1 Cj 


ACCCCAATAC 


TTTCATAGAA 


ACGAACGGGT 


TCCTTGACCT 


840 


^jjAL-LtL-Lj 1 


Vjj/\VjL- 1 VjrAVoL- A 


ATGTCCTCTT 


TTTGCATCAC 


AGCAGCTTCT 


GTATTGCCGA 


900 


pfPAOPOP^ c~* A 

L. 1 ACjCjVjCjAL- A 


a m a a a a m/Tn 
ACj 1 AAAATCT 


GAAAAACTTA 


CCTGAGCTAG 


AGTTTCAGCT 


AGTTTCTGGC 


960 


TAGCAGGCTC 


AAGGAGAGCG 


GTGTGAAAGG 


GACCTGACAC 


CTTAAGAGGA 


ATCAAGCGTT 


1020 


TGGCACCTGC 


TTCTTGCAAA 


AGTTCAACCG 


CTCGATCAAC 


TGCAACCACT 


TCTCCAGCAA 


1080 


TGACGATTTG 


TGCAGGTGTG 


TTATAGTTGG 


CTGGAGTAAC 


CACTCCAAGT 


TCCAGAAGCT 


1140 


TTTTGACAGG 


CTTCTTCAAT 


GACCTCTACT 


GGCGTATTGA 


GAACTGCTAC 


CATCTTGCCA 


1200 


AGTTCAGCA 












1209 



(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 813 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO: 


82 : 






ATGACACGTC 


TGTTCTCTCA 


AGCAGAAATG 


GCAGAGTAAC 


AAGCTCGATA 


TTGAGGTAGC 


60 


CGATAAAGAA 


TTGGCTGAAT 


TTGAAGCTCA 


GATTAAACAG 


GAAGTGGAAG 


CTCCAACTTG 


120 


TAGTGAGTCC 


TCAGGTTGAA 


GAAGAGCCTC 


AGCTCATCCA 


GTTGGCCCAA 


TGTATGAAGA 


180 


ACCAGAAGTA 


AATCCAGTGC 


ATCCGACAGG 


TCCAACACCA 


GCTACAGAAA 


CTGTTGATTC 


240 


AATACCGGGA 


TTTGAAGCAC 


CGCAAGAATC 


TGTTACAATT 


TTATAAGAAA 


TATTCTGAGA 


300 


ACAATATCTT 


ATCCTTATAT 


TTCCAGCGAG 


CAGGAAATGG 


TGTGAGTCCT 


GCATTCCCTA 


360 


TCGATAAGAT 


TATCCTCTCA 


AACTATCAAG 


TCTGAATCTA 


GTAAGATTTG 


ACGTTCCCCA 


420 


CGTTACGGGA 


TAAGAGAGAG 


AAAGACTAAA 


TCTTTTTCCG 


AATAAAGGTG 


G T AC C AC GAT 


480 


TTTCGTCCTT 


TTTGGAAGTC 


GTGGTTTTTA 


ATTTGTTATT 


ATTTATAAAG 


GAGATACCAT 


540 


GAAACTCAAA 


GACACCCTTA 


ATCTTGGGAA 


AACTGAATTC 


CCAATGCGTG 


CAGGCCTTCC 


600 


TACCAAAGAG 


CCAGTTTGGC 


AAAAGGAATG 


GGAAGATGCA 


AAACTTTATC 


AACGTCGTCA 


660 


AGAATTGAAC 


CAAGGAAAAC 


CTCATTTCAC 


CTTGCATGAT 


GGCCCTCCAT 


ACGCTAACGG 


720 


AAATATCCAC 


GTTGGACATG 


CTATGAACAA 


GATTTCAAAA 


GATATCATTG 


TTCGTTCTAA 


780 


GTCTATGTCA 


GGATTTTACG 


CGCCATTTAT 


TCC 






813 



(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 953 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 





SEQUENCE DESCRIPTION: 


ohi<*) ID NO: 


83 : 






ATCGAATTAT 


TTTGAAACAA 


GGTGGATCAG 


LiAl rTTGGC 


CTTGATTAGT 


ATTTTACTCT 


60 


TTAAATACAC 


TTGAAGGTCG 


ATTCTAATCT 


CCjCTAATCCT 


TTTTAATCCA 


GAATAAGGGA 


12 0 


AATATf?TTAT 

-ii-ii x n x vj x x x 


ACTTGTTTTT 


AAGAAAAAAG 


111 L. ATTGAA 


TTGGTTTTGA 


GGAGTTAGAA 


- 180 




TAGTGACAGG 


TTTTGAGCCC 


TTTTGAGGCC 


ATTAAAGGTT 


T AC C AG C TG A 


240 


AATPrATHflT 


GCTGAGGTCC 


GTTGGCTAGA 


GGTGCCGACA 


GTTTTTCACA 


AATCTGCTCA 


300 


AnTATTH^AA 


GAAGAGATGA 


ATCGTTATCA 


ACCTGACTTT 


GTCCTTTGTA 


TTGGGCAAGC 


360 




ACTAGTTTGA 


CACCTGAACG 


AGTGGCCATT 


AATCAAGACG 


ATGCACGTAC 


420 




GAAGATAATC 


AACCGATTGA 


CCGTCCCATT 


CGCCCAGATG 


GTGCTTCGGC 


480 


CTACTTTAGT 


AGTTTGCCGA 


TTAAAGCGAT 


GGTTCAAGCT 


ATAAAAAAGA 


AGGATTACCG 


540 


GCCTCTGTTT 


CCAATACGGC 


AGGG AC TTTT 


GTCTGCAGCC 


ATTTGATGTA 


TCAGGCTCTC 


600 


TATTTGGTAG 


AAAAGAAATT 


CCCATATGTT 


AAGGCAGGTT 


TTATGCATAT 


TCCTTATATG 


660 


ATGGAACAGG 


TGGTGAACAG 


ACCGACTACT 


CCAACTATGA 


GTTTAGTGGA 


TATTCGGCGA 


720 


GGGATAGAAG 


CAGCAATCGG 


CGCTATGATA 


GAACATGGAG 


ATCAGGAACT 


CAAGTTGGTA 


780 


GGCGGAGAAA 


TTCATTGATA 


GAAAAAAGCT 


TGAGGGGAAA 


ACCTTCAAGC 


TTTTGGACGT 


840 


TTTCGAGCCA 


ATACTGCTCG 


GTAAAACATA 


ATTTTAGTGC 


ATTGGATATA 


AGGTAGGAGT 


900 


G AAAAAC TAG 


CAATGCCAAA 


GGTAATCCAA 


TTGAGGAAGT 


ACCAAGGAAG 


AAG 


953 



(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 6 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 



CTACTTGAAA 


CAGAACTGAA 


ATTATACCCA 


CTACCTCCCT 


GATTATCTTC 


AATGCTTACG 


60 


TC T AAAT AAA 


CTTCCCCACT 


ATTATTTAGC 


TTAGCAACAA 


CTGTTATAGT 


AAAATAACAT 


120 


AAAATTCACA 


TAAATAGATT 


AGGGAAATCA 


AAGCAACTTC 


TAGGAATGTT 


TTAGCAGTCA 


180 


CAGTGTACTT 


TCCCAGCATC 


AAGCCACTAT 


AACTCTGCAC 


ATAAAAATGG 


AGAAG ATGG C 


240 


CATCCTCTTC 


TCCAAATATT 


AACTTCTTTA 


CAAACCAACT 


ATAGTTGACA 


AAGAACCTAA 


300 


AATCAATTGA 


TAACACGAGG 


TCAGGTCGGT 


CAACTCTTTC 


AACTGAAGCC 


CTGTCAACTC 


360 


TTCCCATTTA 


TCAATCTTGT 


ATTGGAGAGA 


ATTGCGGTGC 


AGATAGAGTT 


GCTGGGCTGT 


420 


TTAAGTGAGA 


ACAGCACTAT 


TTTCCCAAAG 


AGAGAGAATG 


ATTTCCTGAA 


TCTGATCTTG 


480 


ATCCAAAATC 


ATCTGGTGTA 


GACATTCCTT 


GATTGGCTTC 


AAGTCCACGA 


GTCTTTCTCC 


540 


CAGACTCCAA 


AG AT AG AGC T 


GAGAAAAAGT 


ATGAACACCT 


TGGTGACCCT 


GACGCCACCA 


600 
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TGTCTTGAAC AAATCCCGCT CAGCTTTGAT TAAGTCTGAT AGGGCTTGAT GTCCCGTCTG 660 

AG AC C AAAC C TGACCCAACA TGATAGAAAG ACGAAGTCCA AAGTC AT AC T CAACCGCTTC 72 0 

AATCGTATCA CTTAAAATAT CTCTTACAGA AGTGTATTTG TCTTGTTGAA GCACGAAAAC 780 

ATAATCCTGA GATCCGACCT GTAGCACTGT CTGACAATTC GGAAAAAGAG TCCGCATCAT 840 

ATCTAGCCAA GAAGCCAGAT TTTCCTGCTG AAAATAAGAA AGATGGCAAT AAACCAACTG 900 

AATCTTTTTA AAAACTTGCG GTGCCTGTCC CTTGCCTTCA ACCAGATAGG AATACCAAGG 9 60 

GTTTAGCGAA CGAACCTGCT CCTGCTGGGT CAAAAGGGCA ACCAACTGCT TTTCACGCTC 102 0 

GCTGAGCCCA GCTTCCTCCA GCAAAATCCA CTGCTGAGAG 106 0 

(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 895 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO: 


85 : 






ATTTTAGACT 


TTGATGACAA 


TCCTCAGGCG 


GTTATCATGC 


CCAATCACGA 


GGGGCTGGAA 


60 


TTGCAGTTGC 


CAAAGAAGTG 


TGTTTATGCA 


TTTTTAGGTG 


AGGAGATCTG 


ACCGCTATGC 


120 


AAGGGAAGTA 


GGGGCGGATT 


GTGTCGGCGA 


ATTCGTTTCT 


GCTACCAAGA 


CCTATCCAGT 


180 


CTCTTTCATC 


AACTACAAGG 


GTGAGGAGGT 


CTGTCTGGAT 


CAGGCTCCTG 


CTGGCTCCGC 


240 


TCCAGCAGCC 


CAGTTTATGG 


ATGGGTTGAT 


TGGCTATGGT 


GTGGAGCAGC 


TTATCTCTAC 


300 


TGGGACCTGT 


GGTGTCCTAG 


C TG AT AT AG A 


GGAAAATGCC 


TTTCTAGTCC 


CTGTTCGCGC 


360 


TTTGCGAGAT 


GAGGGAGCCA 


GTTACCACTA 


TGTGGCACCT 


TGTCGTTATA 


TGGAAATGCA 


420 


GCCAGAGGCT 


ATTGCTGCTA 


TTGAGGAAGT 


TTTGGAAGAC 


AGAGGGATTC 


CTTATGAAGA 


480 


AGTCATGACC 


TGG AC G AC AG 


ACGGTTTTTA 


C C GAG AAAC G 


GCTGAAAAGG 


TGGCTTATCG 


540 


TAAGGAAGAA 


GGCTGTGCTG 


TTGTGGAGAT 


GGAGTGTTCT 


GCTCTTGCGG 


CAGTAGCTCA 


600 


ATTGCGTGGG 


GTTCTCTGGG 


GTGAATTGTT 


GTTCACAGCA 


AATTCTCTAG 


CGGACTTGGA 


660 


CCAGTACAAC 


AGTCGTGACT 


GGGGCTCGGA 


ACCTTTTAAT 


AAGGCGCTAA 


AACTGAGTTT 


720 


AGCAAGTGTC 


CACCACCTTT 


AGTTGTACTG 


GCAAAGGATT 


TGTTTTATCA 


TAAAATGTCT 


780 


AGCTCATACT 


TTTCAAAAAT 


ATGTTTAAAC 


GAAGTCACCT 


TCCTCTTGTC 


CTAAGCATGT 


840 


TTGAAGTTGG 


GAAAAATCTT 


TAAAATCAGA 


AAAACGTATC 


ATATCAGGTT 


GATGA 


895 



(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 645 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 



AGGGCTGTCA 


AGCTTGGTTA 


v_jxi_ti.v — \j x x x xiVj 


AAA a ar ap 


1 1 PdPi\D\3 1 


AAA rpnmm a tT^ 


D U 


AATTTTTACG 


AAAAGTATCG 


TQTPTATPTfS 

A w -L \_ X ^i. X V — JL \J 






111 CjjCjjC ACj I A 


ion 
lzu 


r ,r P r T APPS rnmp 
Kj 1 i AL L-A 1 i O 


rnrnrprn a A If" 1 TvTO O 
111 lAJMVjJNCaC 


10> I ACTCGTC 


TTTTTTCTAA 


ATATTCCAGG 


AAAAGGTGTC 


180 


TTAAAACTCG 


ATAATGGAAC 


GATTGTTTAT 


GATGGCAGTC 


TTGTCCGTGG 


TAAAATGAAT 


240 


GGCCAAGGTA 


CCATTACCTT 


CCAAAATGGA 


G AC C AATAT A 


CAGGTGGCTT 


CAACAATGGA 


300 


GCCTTCAACG 


GAAAAGGTAC 


CTTTCAATCT 


AAAGAAGGCT 


GGACCTACGA 


AGGTGATTTT 


360 


GTAAATGGTC 


AGGCTGAAGG 


AAAAGGGAAA 


CTAACAACAG 


AACAAGAAGT 


CGTTTATGAA 


420 


GGAACTTTTA 


AACAAGGCGT 


TTTTCAACAA 


AAATAAAGCC 


TCCTTATCAA 


AGGAGGTATT 


480 


ATTAGAATTA 


CAAGGTAAGC 


GTTTACCTGT 


AAATCCCTTT 


CTTTCCAAAT 


CCCTCTTCCA 


540 


AGCAAGTTTG 


TGAAATAAAA 


AATATTTGAA 


ATAAATTTCA 


CAAACTTCAA 


AG AT AAAAC C 


600 


TGATAAGAAA 


AGAAAATGAG 


AAAAGTTTCG 


CAAGAGTTTA 


AAAAT 




645 



(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 572 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 



GAGATCTGTC 


TTGACACCAA 


AAGTGTGGAG 


TACGCCAGCT 


AATTCAACGG 


CGATATAACC 


60 


AGCGCCTAGA 


ATCGCAATTG 


ACTCTGGAAG 


TTCTTCCCAG 


GCAAATACAT 


CATCAGAAGA 


120 


GCCACCTAGC 


TCAGCACCAG 


GAATATTAGG 


AATACTTGGA 


TGGGCACCTG 


TAGCAATCAC 


180 


GATATGTCTA 


GCACGAATCA 


GTTCACCATT 


TACGCTTACA 


GTATGAGAAT 


C T AC AAATTC 


240 


AGCATGACCT 


TCAATCAAGT 


CTACACCGTT 


GCGTTTAAAA 


CTACCATCAT 


AGAGAAGAAC 


300 


GAGCGCGATC 


AATGTAGGCT 


TCACGATTGC 


GACGTAGGGT 


TGCAAAGTTA 


AAG TTAAGAT 


360 


CAGTAGTCTC 


AAAGCCGTAG 


TCTCCTCCAA 


ATTGATGGAA 


AGTCTCAGCG 


ATTTGCGCCC 


420 


CGCTACCACA 


TGATTCTTTT 


AGGAACACAA 


CCGACGTTGA 


CACAGGTTCC 


ACCTAATTTC 


480 


TTTTCCTCAA 


TAACGGCTGC 


TTTGGCTCCA 


TGTTCCCAGC 


ACGGTTCATG 


GTAGCGATCC 


540 


TCCGCTACCT 


CCACGATAGC 


AATGATATCA 


TA 






572 



(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 49 amino acids 

195 



SUBSTITUTE SHEET (RULE S6) 



WO 98/19689 



PCT/US97/19226 



(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Val Gly Asp Asp Thr Trp Leu Phe Asp Pro Ala Lys Asp Pro Val lie 

15 10 15 

Met lie Leu Pro Glu Thr Phe Phe Leu His Ala Phe Leu Leu Phe Phe 

20 25 30 

Ala Leu Tyr Glu Asn Phe Phe Gly Tyr Leu Tyr Leu Lys Ser Arg Arg 
35 40 45 

Lys 



(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

Val Gin Asp Phe Tyr Thr Ser lie Asp Val Leu Ala Glu Leu Asp Asn 

15 10 15 

Gly Thr Gin Val lie lie Glu lie Gin Val His His Gin Asn Phe Ser 

20 25 30 

Ser lie Thr Cys Gly Leu Thr Cys Ala Val Arg Leu lie Lys Ser 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 67 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

Val Phe Ala Tyr Phe Thr Lys Pro Leu Gly lie Lys Leu Pro Pro Tyr 

15 10 15 

Phe Asp lie Val His Phe Asp Gin Ala Ala Ala lie Phe Asn Lys Tyr 

20 25 30 

Pro Leu Lys Phe Val Asn Cys Val Asn Ser lie Gly Asn Gly Leu Tyr 

35 40 45 

lie Glu Asp Glu Ser Val Val lie Arg Pro Lys Asn Gly Phe Gly Gly 

50 55 60 

lie Gly Gly 
65 

(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

Val Glu Glu Val Glu Val Ala Glu Val Lys Asn Ala Arg Val Ser Leu 

15 10 15 

Thr Gly Glu Lys Thr Lys Pro Met Lys Leu Ala Glu Val Thr Ser lie 

20 25 30 

Asn Val Asn Arg Thr Lys Thr Glu Met Glu Glu Phe Asn Arg Val Leu 

35 40 45 

Gly Gly Gly Val Val Pro Gly Lys Ser Arg Pro His Arg Trp Gly Ser 

50 55 60 

Trp Asp Trp Glu lie Asn Ser Ser Pro Thr Ser Leu Asn Pro Val Val 
65 70 75 80 

Pro Ser Gly Asp Ser Ser Leu Cys Gin Trp Gly Gly Val Cys Pro Ala 
85 90 95 

Asp 
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(2) INFORMATION FOR SEQ ID NO: 92: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 



Val Asp Val Phe 
1 

Val lie Gin Gly 
20 

lie Ala Ser His 
35 

Ser Ser Lys Ala 
50 

Gly Val Arg Gin 
65 



Tyr Asp Gly Gin 
5 

Gin Asn Ala Gly 

Leu Val Lys Gly 
40 

Phe Val Tyr Arg 
55 

Tyr Glu Ala Asn 
70 



Thr Phe Thr lie 
10 

Ala Gly Cys Thr 
25 

Asp Lys Leu Leu 

Ala lie Ala Gin 
60 

Lys Asn Asn 
75 



Leu Glu Asn Pro 
15 

Phe Ala Ser Ser 
30 

Pro Ala Val Glu 
45 

Ala Asp Gin Tyr 



(2) INFORMATION FOR SEQ ID NO: 93: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 



Val lie Ser Val Arg Glu Lys Ser 

1 5 
Ala Val Glu Ala Thr Leu Gly Arg 
20 

Glu Lys Leu Glu Gly Ser Leu Thr 

35 40 
Asn Pro Glu lie Asn Glu Ala Leu 
50 55 



Leu Lys Val Pro Ala lie Leu Glu 

10 15 
Pro Ala Phe Val Ser Phe Asp Ala 
25 30 
Arg Leu Pro Glu Arg Asp Glu lie 
45 

Val Val Glu Phe Tyr Asn Lys Met 
60 
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Leu 
65 



(2) INFORMATION FOR SEQ ID NO: 94: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 



Val lie Val Glu 
1 

Ser lie lie Met 
20 

Ala Glu Val Leu 
35 

Ser Ala His Arg 
50 

Arg Ser Arg Giv- 
es 

His Leu Pro Gly 

Val Pro Val Lys 
100 

lie Val Gin Met 
115 

Glu Leu Phe Phe 
130 



Lys Glu Glu Lys 
5 

Gly Ser Lys Ser 

Asp Arg Phe Gly 
40 

Thr Pro Asp Leu 
55 

lie Lys lie lie 
70 

Met Val Ala Ala 
85 

Ser Arg Ala Leu 

Pro Gly Gly Val 
120 

Arg lie 



Gly Glu Glu Met 
10 

Asp Trp Ala Thr 
25 

Val Ala Tyr Glu 

Met Phe Lys His 
60 

lie Ala Gly Ala 
75 

Lys Thr Thr Leu 
90 

Ser Gly Val Asp 
105 

Pro Val Ala Thr 



Lys Pro Va 1 lie 
15 

Met Gin Lys Thr 
30 

Lys Lys Val Val 
45 

Ala Glu Glu Ala 

Gly Gly Ala Ala 
80 

Pro Val lie Gly 
95 

Ser Leu Tyr Ser 
110 

Met Ala lie Gly 
125 



(2) INFORMATION FOR SEQ ID NO: 95: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 

Val Arg Xaa Xaa Ala Pro Ser Thr Cys Xaa Trp Val Gly His Met Ala 

15 10 15 

Ser Gly Leu Arg His Asp Thr Lys Ala Pro Tyr Ser Asp Ser Xaa Xaa 

20 25 30 

Leu Gly Leu Arg Leu Phe Asn Leu Thr Thr Gin Gin Asn Xaa Thr Arg 

35 40 45 

Arg Phe lie Leu Gin Lys Ala Xaa Ser His Pro Leu Thr Gly Ser Asn 

50 55 60 

Leu Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 

Val Asp Asp Thr Asn Thr Leu Asn Val His lie His Ala Leu Arg Gin 

15 10 15 

Glu Leu Ala Lys Tyr Ser Ser Asp Gin Thr Pro Thr lie Lys Thr Val 

20 25 30 

Trp Gly Leu Gly Tyr Lys lie Glu Lys Pro Arg Gly Gin Thr 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 169 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 



Val lie Tyr Asn 
1 

Leu Tyr Thr Glu 
20 

Ser Ser Met Pro 
35 

Glu Asp His lie 
50 

Arg Leu Met Gly 
65 

Pro Glu Leu Phe 

Glu Thr Ala Arg 
100 

Leu Thr Ser Ala 
115 

Lys lie Asn Glu 
130 

Pro Val Thr Glu 
145 

lie Arg Glu Thr 



lie Pro Gin Leu 
5 

Met Leu Lys Asn 

Val Gin Asp lie 
40 

Val Phe Asn Gly 
55 

Ala Arg Ala Gly 
70 

Leu Lys Leu Asn 
85 

Glu Leu Gin Tyr 

His Gly Asn Met 
120 

Gly Leu Asn lie 
135 

Glu Asp Arg Pro 
150 

Lys Glu Arg Phe 
165 



Ala Gly Val Ala 
10 

Pro Arg Val lie 
25 

Gin Thr Phe Val 

Pro Asp Glu Gin 
60 

lie Gly Gly Thr 
75 

Gin Leu lie Ala 
90 

Ala lie Asn Ala 
105 

Tyr Gly Val lie 

Gly Ser Val Arg 
140 

Val Val Glu Ala 
155 

Leu 



Leu Thr Pro Ser 
15 

Gly Val Lys Asn 
30 

Ser Leu Gly Gly 
45 

Phe Leu Gly Gly 

Tyr Gly Ala Met 
80 

Asp Lys Asp Leu 
95 

lie lie Gly Lys 
110 

Lys Glu Val Leu 
125 

Ser Pro Leu Thr 

Ala Ala Ala Leu 
160 



(2) INFORMATION FOR SEQ ID NO: 98: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 288 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 



Val Thr Tyr Asp Thr lie Gin Phe 

1 5 

Gin Ala Phe Leu Arg Val Lys Gly 
20 

Pro Gly Gin Val Gin Gin Phe Asn 
35 40 



Lys Val Leu Lys Ala Val lie Asp 

10 15 
Tyr Thr Leu Asn Gly His Thr Leu 
25 30 
Gin Val Phe lie Asn Asn His Arg 
45 
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lie Thr Pro Glu 
50 

Tyr Leu Met Lys 
65 

Thr Val Arg Leu 

Lys lie Val Asn 
100 

Glu Arg Lys Leu 
115 

Ser Val Ser Ser 
130 

Ser Asn Asn Thr 
145 

Pro Met Lys Asp 

Asp Lys Leu Ala 
180 

Gly Gly Ser Asn 
195 

Gly Asn Ala Asn 
210 

Lys Ala Tyr Lys 
225 

Ser Ala Lys Val 

Asp Trp Gin Asp 
260 

Gin Gly Trp Glu 
275 



Val Thr Tyr Lys 
55 

Leu Arg Asp Asp 
70 

Gin Val Val Asp 
85 

His Asn Gin Val 

Leu Ser Ser lie 
120 

Asp Gin Thr Gly 
135 

His Val Ser Gly 
150 

Leu Ala Lys Gly 
165 

Ala Gly Val Trp 

Asp Trp Thr Arg 
200 

Tyr Val Gly lie 
215 

Gly lie Val Phe 
230 

Val lie Thr Glu 
245 

Gly Ala lie Ala 

Lys Val Lys Asp 
280 



Lys lie Asn Glu 
60 

Ala His Leu lie 
75 

Asn Gin Leu His 
90 

Thr Pro Gly Gin 
105 

Ser Phe Leu Gly 

Ala Lys Phe Asp 
140 

Asp Asp His lie 
155 

Tyr Met Tyr Gly 
170 

Ser Asn Ser Gin 
185 

Leu Thr Ala Tyr 

His Ser Ser Glu 
220 

Pro Glu Tyr Thr 
235 

Asp Ala Asn Ala 
250 

Tyr Arg Ser lie 
265 

lie Thr Ala Met 



Thr Thr Ala Glu 

Asn Ala Glu Met 
80 

Phe Asp Val Thr 
95 

Lys lie Asp Asp 
110 

Asn Ala Leu Val 
125 

Gly Ala Thr Met 

Asp Val Thr Asn 
160 

Phe Val Ser Thr 
175 

Asn Ser Tyr Gly 
190 

Lys Glu Thr Val 
205 

Trp Gin Trp Glu 

Lys Glu Leu Pro 
240 

Asp Lys Lys Val 
255 

Met Asn Asn Pro 
270 

Thr Leu Val Thr 
285 



(2) INFORMATION FOR SEQ ID NO: 99: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
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Val lie Leu Glu 
1 

Lys Glu Ala lie 
20 

Leu Lys Glu Lys 
35 

Leu Leu Gin lie 

50 
Thr Lys 
65 



Gly Asn Tyr Arg 
5 

Leu Glu Tyr Gin 

Ala Lys Asn lie 
40 

Trp Leu Asp Phe 
55 



Ala Thr Ala Gly 
10 

Ala Asn Pro Ala 
25 

Ser Arg Glu Tyr 

Tyr Glu Lys Gin 
60 



Arg Glu Glu Met 
15 

Ala Leu Lys Asp 
30 

Ser Glu Glu His 
45 

Ala Ala Leu Gly 



(2) INFORMATION FOR SEQ ID NO: 100: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE : None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 



Val Thr Phe Leu Asp Asp Tyr His Lys Lys His Asn Tyr Pro Leu Phe 

15 10 15 

Tyr Glu Ser Tyr Leu Gin Asn Val Met Glu Phe Leu Glu Ser Gin Asp 

20 25 30 

lie Lys Asn Gly Val Asp Ala Phe Val Asp Asp His Gin Asn Leu Val 

35 40 45 

Phe Val Leu Tyr Gly Gin Gly Tyr Arg Ala Glu Gly Lys Glu Gly lie 

50 55 60 

Leu Thr Thr Gin Val Thr Val Lys Ala Tyr Asp Glu Asp Lys Lys Pro 
65 70 75 80 

lie Asn Phe Ala Asn Leu Leu Asp Ser Leu lie Val Ser Glu Tyr Gin 

85 90 95 

Met Glu Pro Asn Leu Trp Glu Val Ser Tyr Asp 
100 105 



(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 185 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 



Val Arg Lys Ser Val Pro Arg Pro Arg Leu Arg Gin Arg Ser Leu Ser 

15 10 15 

Lys Val Ala Arg Ser Arg Leu Lys lie Lys Lys Leu Ser Lys Val Lys 

20 25 30 

His Glu Gly Gly Val Val lie Glu Gly Ala Ser Gly Leu Leu Val Arg 

35 40 45 

lie Ala Lys Cys Cys Asn Pro Val Pro Gly Asp Asp lie Val Gly Tyr 

50 55 60 

He Thr Lys Gly Arg Gly Val Ala He His Arg Val Asp Cys Met Asn 
65 70 75 80 

Leu Arg Ala Gin Glu Asn Tyr Glu Gin Arg Leu Leu Asp Val Glu Trp 

85 90 95 

Glu Asp Gin Tyr Ser Ser Ser Asn Lys Glu Tyr Met Ala His He Asp 

100 105 110 

He Tyr Gly Leu Asn Arg Thr Gly Leu Leu Asn Asp Val Leu Gin Val 

115 120 125 

Leu Ser Asn Thr Thr Lys Asn He Ser Thr Val Asn Ala Gin Pro Thr 

130 135 140 

Lys Asp Met Lys Phe Ala Asn He His Val Ser Phe Gly He Ala Asn 
145 150 155 160 

Leu Ser Thr Leu Thr Thr Val Val Asp Lys He Lys Ser Val Pro Glu 

165 170 175 

Val Tyr Ser Val Lys Arg Thr Asn Gly 
180 185 



(2) INFORMATION FOR SEQ ID NO: 102: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
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Val lie Val Phe 
1 

Arg Val lie Asp 
20 

Val Asp Val Thr 
35 

Val Leu Ala Asp 
50 

Phe Ser Ala Val 
65 

His Ala Arg Val 

Leu Lys Thr Gly 
100 

Phe Lys Lys 
115 



Leu Val Tyr Leu 
5 

Glu Thr Glu Lys 

Leu His His Thr 
40 

Asp lie Asn Val 
55 

Ala Asp Leu Ser 
70 

Leu Ser Lys Lys 
85 

Ala Ser Leu Ser 



lie lie Thr Val 
10 

Thr lie Lys Thr 
25 

Asn Glu Leu Leu 

Lys Val Ala Thr 
60 

Leu Ser Val Ser 
75 

Ala Ser Ser Ala 
90 

Ala Leu Arg Leu 
105 



Gin Lys Leu Gly 
15 

Leu Thr Ser Asp 
30 

Ala Lys Val Asn 
45 

lie Asp Pro Leu 

Asp Leu Asn Asp 
80 

Gly Ser Lys Thr 
95 

Ala Ser Lys Phe 
110 



(2) INFORMATION FOR SEQ ID NO: 103: 



(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 6 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 



Val Thr Gly Asn 
1 

Ser Trp Leu lie 
20 

Glu Tyr Ala Arg 
35 

lie Phe Met Val 
50 

Asp Gly Tyr Lys 
65 

Thr Leu lie Asn 
Asp Tyr Arg Asp 



Trp Gin lie Leu 
5 

Gly Pro Cys Ser 

Arg Leu Ser Ala 
40 

Met Arg Val Tyr 
55 

Gly Leu Val His 
70 

Gly Leu Gin Ala 
85 

Trp Phe Asp Asn 



Phe Gin Gly Lys 
10 

Ser Asp Asn Glu 
25 

Leu Gin Lys Lys 

Thr Ala Lys Pro 
60 

Gin Pro Asp Thr 
75 

Val Arg Gin Leu 

- 90 
Gly Arg 

205 



Met Thr Val Phe 
15 

Glu Ala Val Leu 
30 

Val Ala Asp Lys 
45 

Arg Thr Asn Gly 

Ser Lys Ala Pro 
80 

His Tyr Arg Val 
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100 105 
(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

Val Gly Thr Gly lie lie Gly Ser lie Val Ser Tyr Pro Val Met Val 

15 10 15 

Leu Phe Thr Gly Ser Ala Ala Lys Leu Ser Trp Phe lie Tyr Thr Pro 

20 25 30 

Arg Phe Phe Gly Ala Thr Leu lie Gly Thr Ala lie Ser Phe lie Ala 

35 40 45 

Phe Arg Phe Leu lie Lys Gin Glu Phe Phe Lys Lys Val Gin Gly Tyr 

50 55 60 

Phe Phe Ala Glu Arg lie Glu 
65 70 

(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 

Val Ala lie Ala Arg Gly Leu Ser Met Asn Pro Asp lie Met Leu Phe 

15 10 15 

Asp Glu Pro Asn Ser Ala Leu Asp Pro Glu Met Val Gly Glu Val lie 

20 25 30 

Asn Val Met Lys Glu Leu Ala Glu Gin Gly Met Thr Met lie lie Val 

35 40 45 
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Thr His Glu Met Gly Phe Ala Arg Gin Val Ala Asn Arg Val lie Phe 

50 55 60 

Thr Ala Asp Gly Glu Phe Leu Glu Asp Gly Thr Pro Asp Gin lie Phe 
65 70 75 80 

Asp Asn Pro Gin His Pro Arg Leu Lys Glu Phe Leu Asp Lys Val Leu 
85 90 95 

Asn Val 



(2) INFORMATION FOR SEQ ID NO: 10 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 2 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 

Val Gin Ala Val Ser Glu Ser Ala Ala Ala Pro Val Arg Ala Lys Val 

15 10 15 

Arg Pro Thr Tyr Ser Thr Asn Ala Ser Ser Tyr Pro lie Gly Glu Cys 

20 25 30 

Thr Trp Gly Val Lys Thr Leu Ala Pro Trp Ala Gly Asp Tyr Trp Gly 

35 40 45 

Asn Gly Ala Gin Trp Ala Thr Ser Ala Ala Ala Ala Gly Phe Arg Thr 

50 55 60 

Gly Ser Thr Pro Gin Val Gly Ala lie Ala Cys Trp Asn Asp Gly Gly 
65 70 75 80 

Tyr Gly His Val Ala Val Val Thr Ala Val Glu Ser Thr Thr Arg lie 

85 90 95 

Gin Val Ser Glu Ser Asn Tyr Ala Gly Asn Arg Thr lie Gly Asn His 

100 105 110 

Arg Gly Trp Phe Asn Pro Thr Thr Thr Ser Glu Gly Phe Val Thr Tyr 

115 120 125 

lie Tyr Ala Asp 
130 

(2) INFORMATION FOR SEQ ID NO: 107: 
(i) SEQUENCE CHARACTERISTICS: 

207 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



(A) LENGTH: 8 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

Val lie Leu Leu Asn Ser Glu Glu Lys Val Lys Lys Glu Arg Arg Ser 

15 10 15 

Lys Glu Arg lie Ser Thr Thr Lys Lys Gly Phe Phe Arg Met Val Leu 

20 25 30 

Arg Tyr His Leu Thr Leu Leu Gly Gin Gly Thr Gly Val Val Thr Val 

35 40 45 

Leu Phe Thr Ser Ala Phe Leu Pro Tyr Leu Met Met lie Gly Leu lie 

50 55 60 

Ser Lys lie Arg Asp Ser Gin lie Val Pro Asp lie His Pro Pro Tyr 
65 70 75 80 

Trp Leu Pro Phe Phe Leu 
85 

(2) INFORMATION FOR SEQ ID NO: 10 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 08 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 

Val Thr Pro Leu Ser Leu Leu Cys Leu Arg Lys Cys Val Arg Asp Glu 

15 10 15 

Asn Val Phe Leu Met Gly Glu Asp Val Gly Val Phe Gly Gly Asp Phe 

20 25 30 

Gly Thr Ser Val Gly Met Leu Glu Glu Phe Gly Pro Glu Arg Val Arg 

35 40 45 

Asp Cys Pro lie Ser Glu Ala Ala lie Ser Gly Ala Ala Ala Gly Ala 

50 55 60 

Ala Met Thr Gly Leu Arg Pro lie Val Asp Met Thr Phe Met Asp Phe 
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65 

Ser Val lie Ala 

Tyr Met Phe Gly 
100 

Ala Gly Asn Gly 
115 

Ser Trp Phe Thr 
130 

Pro Ala Asp Met 
145 

Pro Val lie lie 

Val Pro Val Asp 
180 

Lys Arg Gin Gly 
195 

Arg Arg Val Val 
210 

Val Glu lie Val 
225 

lie lie Asn Ser 

Ala His Lys Thr 
260 

Glu Ser Glu Ala 
275 

Gly Glu Asp Val 
290 

Asp Ser Asn Ser 
305 



70 

Met Asp Asn lie 
85 

Gly Lys Gly Gin 

Val Gly Ser Ala 
120 

His lie Pro Gly 
135 

Lys Gly Leu Leu 
150 

Leu Glu Tyr Lys 
165 

Pro Asp Tyr Thr 

Thr Asp Val Thr 
200 

Gin Ala Ala Glu 
215 

Asp Pro Arg Thr 
230 

Val Lys Lys Thr 
245 

Ser Gly Tyr lie 

Phe Asp Tyr Leu 
280 

Pro Met Pro Tyr 
295 



75 

Val Asn Gin Ala 
9 0 

Val Pro Met Thr 
105 

Ala Gin His Ser 

Leu Lys Val Val 
140 

Lys Ser Ser lie 
155 

Ser Glu Phe Asn 
170 

lie Pro Leu Gly 
185 

Val Val Thr Tyr 

Glu Leu Ala Glu 
•220 

Leu Val Pro Leu 
235 

Gly Lys Val Val 
250 

Gly Glu lie Ser 
265 

Asp Ala Pro lie 

Ala Gin Asn Leu 
300 



80 

Ala Lys Thr Arg 
95 

Val Arg Cys Ala 
110 

Gin Ser Leu Glu 
125 

Ala Pro Gly Thr 

Arg Asp Asn Asn 
160 

Gin Lys Gly Glu 
175 

Val Gly Glu lie 
190 

Gly Lys Met Leu 
205 

Glu Gly lie Ser 

Asp Lys Asp lie 
240 

Leu Val Asn Asp 
255 

Ala lie lie Ser 
270 

Arg Arg Cys Ala 
285 

Lys Met Cys Asn 



(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109; 
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Val Asp Gly Ala Thr Thr lie Asp lie Gly Ala Ser Thr Gly Gly Phe 

15 10 15 

Thr Asp Val Met Leu Gin Asn Ser Ala Lys Leu Val Phe Ala Val Asp 

20 25 30 

Val Gly Thr Asn Gin Leu Ala Trp Lys Leu Arg Gin Asp Pro Arg Val 

35 40 45 

Val Ser Met Glu Gin Phe Asn Phe Arg Tyr Ala Glu Lys Thr Asp Phe 

50 55 60 

Glu Gin Glu Pro Ser Phe Ala Ser lie Asp Val Ser Phe lie Ser Leu 
65 70 75 80 

Ser Leu lie Leu Pro Ala Leu His Arg Val Leu Ala Asp Gin Gly Gin 

85 90 95 

Val Val Ala Leu Val Lys Pro Gin Phe Glu Ala Gly Arg Glu Gin lie 

100 105 110 

Gly Lys Asn Gly lie lie Arg Asp Ala Lys lie His Gin Asn Val Leu 

115 120 125 

Glu Ser Val Thr Ala Met Ala Val Glu Ala Gly Phe Ser Val Leu Gly 

130 135 140 

Leu Asp Phe Ser Pro lie Gin Gly Gly His Gly Asn lie Glu Phe Leu 
145 150 155 160 

Val Tyr Leu Lys Lys Glu Lys Ser Ala Ser Asn Gin lie Leu Ala Glu 

165 170 175 

lie Lys Glu Ala Val Glu Arg Ala His Ser Gin Phe Lys Asn Glu 
180 185 190 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 

Val Ser Ser Asp Val Lys Trp Leu Cys Gin Asn His Pro Lys Trp His 

15 10 15 

Lys Leu Arg Gly lie Gly Met Thr Arg Asn Thr lie Asp Arg Asp Gly 

20 25 30 

lie Thr Ser Gin Asp Val Arg Tyr Phe lie Phe Asn Phe Lys Leu Asp 
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35 40 45 

Val Asp Asp Leu Leu Pro 
50 



(2) INFORMATION FOR SEQ ID NO: 111: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 6 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 



Val Asp Leu Gin Ser Lys Asn Trp Ser Phe Val His Arg Phe Ser Glu 

15 10 15 

Glu Leu lie Asp Gin His Tyr Gin Asp Leu Val Gly Gin Ser Phe Tyr 

20 25 30 

Pro Pro lie Arg Glu Phe Met Thr Ser Gly Pro Val Leu Val Gly Val 

35 40 45 

lie Ser Gly Pro Lys Val lie Glu Thr Trp Arg Thr Met Met Gly Ala 

50 55 60 

Thr Arg Pro Glu Glu Ala Leu Pro Gly Thr lie Arg Gly Asp Phe Ala 
65 70 75 80 

Lys Ala Ala Gly Glu Asn Glu lie lie Gin Asn Val Val His Gly Ser 

85 90 95 

Asp Ser Glu Lys Ser Gin Leu Ser Arg Glu lie Ala Pro Leu Val Leu 

100 105 110 

Arg Val Asp Trp Leu Asn Gin Leu Val Lys Ser Ser Phe Glu 
115 120 125 



(2) INFORMATION FOR SEQ ID NO: 112: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

Val Leu Lys Gly Val Leu Thr Leu Arg Glu Leu Thr Asn Asp Arg Asp 

15 10 15 

Ala Asp lie Asn Asp Phe Val Lys Val Gly Glu Val Leu Asp Val Leu 

20 25 30 

Val Leu Arg Gin Val Val Gly Lys Asp Thr Asp Thr Val Thr Tyr Leu 
35 40 45 

Val lie 
50 

(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:113: 

Val Gly Glu Pro Phe Ala Asn Leu Ser Asp Leu Leu Asp Thr Tyr Tyr 

15 10 15 

Lys Asp Lys Ala Glu Arg Asp Arg Val Lys Gin Gin Ala Ser Glu Leu 

20 25 30 

lie Arg Arg Val Glu Asn Glu Leu Gin Lys Asn Arg His Lys Leu Lys 

35 40 45 

Lys Gin Glu Lys 
50 

(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
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Val Lys Asp Lys 

1 

Arg Tyr Ser lie 
20 

Gin Val Val Arg 
35 

Ala Gly Phe Val 
50 

Arg Val Lys lie 
65 

Thr lie Tyr Glu 

Thr His Trp Gly 
100 

Leu 



Thr Leu lie lie 
5 

Thr Trp Glu Glu 

Ser His Ser Trp 
40 

Leu Asn Leu Pro 
55 

Glu Lys Lys Thr 
70 

Asn Arg Pro lie 
85 

Thr Thr Leu Asn 



Gin His Ser Gly 
10 

Val Pro Val Asp 
25 

Glu Gly Asn Gly 

lie Lys Glu Asn 
60 

Gly Leu Leu Trp 
75 

Leu Ala Gin Pro 
90 

Ser Lys Val Ser 
105 



Ala Tyr lie Ala 
15 

Lys Asp Gly Asn 
30 

Arg Asn Gin Thr 
45 

Met Arg Asn Leu 

Asn Arg Trp Gin 
80 

His Arg Lys lie 
95 

Asp Asp Asp Val 
110 



(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 

Val Leu Gly Ala Gly Lys Arg Leu Thr Gly Tyr Ala Ala Gly Val Glu 

1.5 10 15 

Lys Lys Ala Trp Leu Leu Glu His Glu Gly Val Asp Phe Lys Asp Arg 

20 25 30 

Asn Asn Arg Arg Arg Ser Thr Cys 
35 40 

(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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<D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 

Val His Val Cys Cys Ala Pro Cys Ser Thr Tyr Thr Leu Glu Tyr Leu 

1. 5 10 15 

Thr Lys Tyr Ala Asp Val Thr lie Tyr Phe Ala Asn Ser Asn lie His 

20 25 30 

Pro Lys Ala Glu Tyr His Lys Arg Val Tyr Val Thr Lys Lys Phe Val 

35 40 45 

Ser Asp Phe Asn Glu Gin Thr Gly Asn Thr Val Gin Tyr Leu Glu Ala 

50 55 60 

Pro Tyr Glu Pro Asn 
65 

(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 

Val Ala Met Asp Leu Gly Phe Asp Tyr Phe Gly Ser Ala Leu Thr lie 

15 10 15 

Ser Pro His Lys Asn Ser Gin Thr lie Asn Ser lie Gly lie Asp Val 

20 25 30 

Gin Lys lie Tyr Thr Pro His Tyr Leu Pro Asn Asp Phe Lys Lys Asn 

35 40 45 

Gin Gly Tyr Lys Arg Ser Val Glu Met Arg Glu Glu Tyr Asp lie Tyr 

50 55 60 

Arg Gin Cys Tyr Cys Gly Cys Val Tyr Ala Ala Gin Ala Gin Asn lie 
65 70 75 80 

Asp Leu Val 



(2) INFORMATION FOR SEQ ID NO: 118; 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 

Val Thr Asp Gly Val lie Gin Val Asp Val Leu Gly Ser lie Val Arg 

15 10 15 

Ser Glu Glu Trp Leu Leu Asp Asn Leu Ser Lys Gin Gly His Asp Asn 

20 25 30 

Val Ala Asn lie Phe lie Ala Glu Tyr Asp Lys Gly Ala Val Thr Val 

35 40 45 

Val Thr Tyr Lys 
50 

(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 06 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 

Val Arg Glu Tyr Arg Thr Tyr Glu Glu lie Ala Ala Asp Phe Gly lie 

15 10 15 

His Glu Ser Asn Leu He Arg Arg Ser Gin Trp Val Glu Val Thr Leu 

20 25 30 

Val Gin Ser Gly Val Thr He Ser Lys Thr His Leu Ser Ala Glu Asn 

35 40 45 

Thr Val He Val Asp Ala Thr Glu Val Lys He Asn Arg Pro Lys Lys 

50 55 60 

Gin Leu Ala Asn Asp Ser Gly Lys Lys Lys Phe His Ala Met Lys Ala 
65 70 75 80 

Gin Ala He Val Thr Ser Gin Gly Arg He Val Ser Leu Asp He Ala 
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85 90 95 

Val Asn Tyr Cys His Asp Met Lys Leu Phe Lys Met Ser Arg Arg Asn 

100 105 110 

lie Gly Gin Ala Gly Lys lie Leu Ala Asp Ser Gly Tyr Gin Gly Pro 

115 120 125 

Met Lys lie Tyr Pro Gin Ala Gin Thr Pro Arg Lys Ser Ser Lys Leu 

130 135 140 

Lys Pro Leu lie Ala Glu Asp Lys Ala Tyr Asn His Ala Leu Ser Lys 
145 150 155 160 

Glu Arg Ser Lys Val Glu Asn lie Phe Ala Lys Val Lys Thr Phe Lys 

165 170 175 

Met Phe Ser Thr Thr Tyr Arg Asn His Arg Lys Arg Phe Gly Leu Arg 

180 185 190 

Met Asn Leu lie Ala Gly lie lie Asn Tyr Glu Leu Gly Phe 
195 200 205 



(2) INFORMATION FOR SEQ ID NO: 12 0: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 0: 



Val Met Gly Pro 
1 

Gin Asn Gin Val 
20 

Leu Glu Val Ala 
35 

Val Leu Gly Phe 
50 

Ala Gin Tyr Gly 
65 

lie Lys Gly Asn 



Gin Gly Asn Gly 
5 

Leu Leu Val Gly 

Lys Glu Leu His 
40 

Ala Asn Lys Asp 
55 

Gin Val Phe Val 
70 

Val Pro Leu Leu 
85 



Phe Asp Leu Ser 
10 

Gly Gly lie Gly 
25 

Glu Arg Gly Val 

Ala Val lie Leu 
60 

Thr Thr Asp Asp 
75 

Ser Met lie 
90 



Asp Leu Asp Glu 
15 

Val Pro Pro Leu 
30 

Lys Val Val Thr 
45 

Lys Thr Glu Leu 

Gly Ser Tyr Gly 
80 



(2) INFORMATION FOR SEQ ID -NO: 121: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 222 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 



Val Lys Met Val 
1 

Lys lie Me t Thr 
20 

Leu Lys Asn Pro 
35 

Glu Tyr Ala Lys 
50 

Lys Ala Thr Thr 
65 

Ala Glu Thr Pro 

Gly Leu Glu Val 
100 

Tyr Pro Asn Leu 
115 

Glu Tyr Ala Ala 
130 

Ala lie Glu Leu 
145 

Gly Leu Leu lie 

Ala Ala Val Glu 
180 

Ser Val Thr Asp 
195 

Ala Ser Gly Leu 
210 



Leu Phe Ser Ala 
5 

Thr Asn Arg Leu 

lie lie Pro Ala 
40 

Tyr Tyr Asp Leu 
55 

Leu Glu Pro Arg 
70 

Ala Gly Met Leu 
85 

Val Leu Ala Glu 

Pro lie lie Ala 
120 

Val Ser His Gly 
135 

Asn lie Ser Cys 
150 

Gly Gin Asp Pro 
165 

Ala Ser Glu Val 

lie Val Thr Val 
200 

Thr Met lie lie 
215 



Gin Glu Gin Leu 
10 

Gin Val Ser Leu 
25 

Ser Gly Cys Phe 

Asp Leu Leu Gly 
60 

Phe Gly Asn Pro 
75 

Asn Ala lie Gly 
90 

Lys Leu Pro Trp 
105 

Asn Val Ala Gly 

lie Ser Lys Ala 
140 

Pro Asn Val Asp 
155 

Asp Leu Ala Tyr 
170 

Pro Val Tyr Val 
185 

Ala Lys Ala Ala 

Leu Trp Trp Asp 
220 



Tyr Tyr Lys Glu 
15 

Pro Gly Leu Asp 
30 

Gly Phe Gly Gin 
45 

Ser lie Met lie 

Thr Pro Arg Val 
80 

Leu Gin Asn Pro 
95 

Leu Glu Arg Glu 
110 

Phe Ser Lys Gin 
125 

Thr Asn lie Lys 

His Cys Asn His 
160 

Asp Val Val Lys 
175 

Lys Leu Thr Pro 
190 

Glu Asp Ala Gly 
205 

Ala Leu 



(2) INFORMATION FOR SEQ ID NO: 122: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 5 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 

Val Ala Thr Gly Gin Asp Lys Ala His Ser lie Leu Ala Ser Asn Glu 

15 10 15 

Gly Thr Leu His Tyr Leu Val Pro Leu Lys Gin Gly Met Ser lie Gin 

20 25 30 

Gin Gly Gin Thr lie Ala Glu Val Ser Gly Lys Glu Lys Gly Tyr Tyr 

35 40 45 

Val Glu Ala Phe Val Leu Ala Ser Asp lie Ser Arg Val Ser Lys Gly 

50 55 60 

Ala Lys Val Asp Val Ala lie Thr Gly Val Asn Ser Gin Lys Tyr Gly 
65 70 75 80 

Thr Leu Lys Gly Gin Val Arg Gin lie Asp Ser Gly Thr lie Ser Gin 

85 90 95 

Glu Thr Lys Glu Gly Asn lie Ser Leu Tyr Lys Val Met lie Glu Leu 

100 105 110 

Glu Thr Leu Thr Leu Lys His Gly Ser Glu Thr Val lie Leu Gin Lys 

115 120 125 

Asp Met Pro Val Glu Val Arg lie Val Tyr Asp Lys Glu Thr Tyr Leu 

130 135 140 

Asp Trp lie Leu Glu Met Leu Ser Phe Lys Gin 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:123: 

Val Arg Val Pro Glu Thr lie Thr Gin -Glu Glu Leu Leu Asp Leu lie 
15 10 15 
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Ala Lys Tyr Asn 
20 

Pro Leu Pro Lys 
35 

Pro Glu Lys Asp 
50 

Trp Ser Gly His 
65 

Glu Met Phe His 

Val lie Gly Arg 
100 

Leu Ala Lys Asn 
115 

Leu Ala Lys Val 
130 

Arg Ala Lys Phe 
145 

lie Asp Val Gly 

Val Asp Tyr Glu 
180 

Pro Gly Gly Val 
195 

Tyr Gin Ala Ala 
210 



Gin Asp Pro Ala 

His lie Asp Glu 
40 

Val Asp Gly Phe 
55 

Pro Val Met lie 
70 

Glu Tyr Gly lie 
85 

Ser Asn lie Val 

Ala Thr Val Thr 
120 

Ala Ala Lys Ala 
135 

Val Thr Ala Asp 
150 

Met Asn Arg Asp 
165 

Ala Val Ala Pro 

Gly Pro Met Thr 
200 

Leu Arg Thr Leu 
215 



Trp His Gly lie 
25 

Glu Ala Val Leu 

His Pro Leu Asn 
60 

Pro Ser Thr Pro 
75 

Asp Leu Glu Gly 
90 

Gly Lys Pro Met 
105 

Leu Ala His Ser 

Asp lie Leu Val 

140 

Phe Val Lys Pro 
155 

Glu Asn Gly Lys 
170 

Leu Ala Ser His 
185 

lie Thr Met Leu 
Asp Arg Lys 



Leu Val Gin Leu 
30 

Leu Ala lie Asp 
45 

Met Gly Arg Leu 

Ala Gly lie Met 
80 

Lys Asn Ala Val 
95 

Ala Gin Leu Leu 
110 

Arg Thr His Asn 
125 

Val Ala lie Gly 

Gly Ala Val Val 
160 

Leu Cys Gly Asp 
175 

lie Thr Pro Val 
190 

Met Glu Gin Thr 
205 



(2) INFORMATION FOR SEQ ID NO: 12 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 4: 



Val Gly Val Tyr Leu Ser Glu Gly 

1 5 
Thr Val Thr Leu lie Ser Leu Val 



Leu Pro Asp Leu lie Arg Val Thr 

10 15 
Gly Glu Thr Ala Met Ala Gly Ala 
25 30 
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Val Gly Ala Gly Gly lie Gly Asn Val Ala lie Ala Tyr Gly Phe Asn 

35 40 45 

Arg Tyr Asn His Asp Val Thr He Leu Ala Thr He Val He He Leu 

50 55 60 

He He Phe Ala He Gin Phe Leu Gly Asp Phe Leu Thr Lys Lys Leu 
65 70 75 80 

Ser His Lys 



(2) INFORMATION FOR SEQ ID NO: 12 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 5: 

Val Leu Pro Leu Tyr Leu Leu Phe Val Pro Tyr Gly Lys Ser Lys Lys 

15 10 15 

Glu Val Lys Lys Arg Ala Lys Glu Ala Ser Arg Leu Thr Arg Glu Met 

20 25 30 

Lys Gly Leu He Phe Thr Leu Ala He Glu Ala Ala Val Val Val Cys 

35 40 45 

Thr Asn Thr Ala He Thr He Arg He Pro Ser Leu Met Val Glu Arg 

50 55 60 

Gly Leu Gly Asp Ala Gin Leu Ser Ser Phe Val Leu Ser He Met Gin 
65 70 75 80 

Leu He Gly He Val Ala Gly Val Ser Phe Ser Phe Leu He Ser He 

85 90 95 

Phe Lys Glu Lys Leu Leu Leu Trp Ser Gly He Thr Phe Gly Leu Gly 

100 105 110 

Gin He Val He Ala Leu Ser Ser Ser Leu Trp Val Val Val Ala Gly 

115 120 125 

Ser Val Leu Ala Gly Phe Ala Tyr Ser Val Val Leu Thr Thr Val Phe 

130 135 140 

Gin Leu Val Ser Glu Arg He Pro Ala Lys Leu Leu Asn Gin Ala Thr 
145 150 155 160 

Ser Phe Ala Val Leu Gly Cys Ser Phe -Gly Ala Phe Thr Thr Pro Phe 
165 170 175 
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Val Leu Gly Ala lie Gly Leu Leu Thr His Asn Gly Met Leu Val Phe 

180 185 190 

Ser lie Leu Gly Gly Trp Leu lie Val lie Ser lie Phe Val Met Tyr 

195 200 205 

Leu Leu Gin Lys Arg Ala Leu Gly Leu lie Pro Lys Phe Phe Phe 
210 215 220 

(2) INFORMATION FOR SEQ ID NO: 12 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:126: 

Val Val Ala Gly Pro Glu Gly Leu Asp Glu Ala Gly Leu Asn Gly Thr 

15 10 15 

Thr Xaa lie Ala Leu Xaa Glu Asn Gly Glu lie Ser Leu Ser Ser Phe 

20 25 30 

Thr Pro Glu Asp Leu Gly Met Glu Gly Tyr Ala Met Glu Asp lie Arg 

35 40 45 

Gly Gly Asn Ala Gin Glu Asn Ala Glu lie Leu Leu Ser Val Leu Lys 

50 55 60 

Asn Glu Ala Ser Pro Phe Leu Glu Thr Thr Val Leu Asn Ala Gly Leu 
65 70 75 80 

Gly Phe Tyr Ala Asn Gly Lys lie Asp Ser lie Lys Glu Gly Val Ala 

85 90 95 

Leu Ala Arg Gin Val lie Ala Arg Gly Lys Ala Leu Glu Lys Leu Arg 

100 105 HO 

Leu Leu Gin Glu Tyr Gin Lys 
115 

(2) INFORMATION FOR SEQ ID NO: 12 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 

Val Asp lie Val Gin Gin Ala Gin Thr Tyr Glu Glu Asn Gly Ala Val 

15 10 15 

Met lie Ser Val Leu Thr Asp Glu Val Phe Phe Lys Gly His Leu Asp 

20 25 30 

Tyr Leu Arg Glu lie Ser Ser Gin Val Glu lie Pro Thr Leu Asn Lys 

35 40 45 

Asp Phe lie lie Asp Glu Lys Gin lie lie Arg Ala Arg Asn Ala Gly 

50 55 60 

Ala Thr Val lie Leu Leu lie Val Ala Ala Leu Ser Glu Glu Arg Leu 
65 70 75 80 

Lys Glu Leu Tyr Asp Tyr Ala Thr Glu Leu Gly Leu Glu Val Leu Val 

85 90 95 

Glu Thr His Asn Leu Ala Glu Leu Glu Val Ala His Arg Leu Gly Gly 
100 105 110 

(2) INFORMATION FOR SEQ ID NO: 12 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 8: 

Val Ser Glu Lys His Ala Gly Phe Met lie Asn Val Ala Asp Gly Thr 

15 10 15 

Ala Lys Asp Tyr Glu Asp Leu lie Gin Ser Val lie Glu Lys Val Lys 

20 25 30 

Glu His Ser Gly lie Thr Leu Glu Arg Glu Val Arg lie Leu Gly Glu 

35 40 45 

Ser Leu Ser Val Ala Lys Met Tyr Ala Gly Gly Phe Thr Pro Cys Lys 
50 55 60 

Arg 
65 
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(2) INFORMATION FOR SEQ ID NO: 12 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 

Val Glu Arg lie lie Arg Lys Ala Phe Ala lie Glu Leu Gin Glu lie 

15 10 15 

Ala Glu Lys Ser Leu Leu Val Ser lie Ser Lys Met Phe 
20 25 

(2) INFORMATION FOR SEQ ID NO: 13 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 0: 

Val Arg lie Gly Asn Thr Val Leu Ala Asn Val Thr Ser Gly Val Ala 

15 10 15 

Lys Gin Ala Ser Lys Ala Ala Gin Ala Ser Asn Leu Gly Gly Gly Ala 

20 25 30 

.Glu Val Asp Gly Phe Ser Lys Thr Leu Ser Ser Leu Asp lie Ser lie 

35 40 45 

Gin Thr Ser Asp Phe lie lie lie Phe Val Leu Ala Leu Val Leu Val 

50 55 60 

Val Leu Val Met Ala Leu Ala Ser Ser Asn Leu Leu Arg Lys Gin Pro 
65 70 75 80 

Lys Glu Leu Leu Leu Asp Gly Glu 
85 

(2) INFORMATION FOR SEQ ID NO: 131: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 164 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 



Val Ser Asn Lys 
1 

Gly Thr Tyr Ser 
20 

Glu Leu Lys Val 
35 

Gly Pro Leu Leu 
50 

Thr lie Thr Asp 
65 

Tyr Thr Asp Ala 

Lys lie Glu Asp 
100 

lie Thr Gin Arg 
115 

Phe Lys Phe Val 
130 

His Ala His Arg 
145 

Gly Tyr Thr Ser 



Thr Phe Pro lie 
5 

Gly lie Glu Thr 

Lys lie His Tyr 
40 

Asp Asn Glu Gin 
55 

Glu Arg Lys Lys 
70 

Ser Gly Phe Leu 
85 

Leu Asn Gly Lys 

Leu lie Thr Glu 

120 

Glu Leu Gly Ser 
135 

lie Asp Ala Phe 
150 



Leu Val Asn Lys 
10 

Asp Leu Ala Lys 
25 

Val Pro Val Thr 

Val Asp Met Asp 
60 

Leu Tyr Asn Phe 
75 

Val Asn Lys Ser 
90 

Thr lie Gly Val 
105 

Leu Gly Lys Lys 

Tyr Pro Glu Leu 
140 

Ser Val Asp Arg 
155 



Asp Pro Lys Thr 
15 

Met Val Ala Asp 
30 

Ala Gin Thr Arg 
45 

lie Ala Thr Phe 

Thr Ser Pro Tyr 
80 

Ala Lys He Lys 
95 

Ala Gin Gly Ser 
110 

Lys Gly Leu Lys 
125 

He Thr Ser Leu 

Ser lie Leu Ser 
160 



(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 2: 

Val Leu Glu Glu Leu Arg lie Pro Ala Pro Asn Glu Phe Glu Asp Leu 

15 10 15 

Asp Leu Ser Pro Leu Asp Phe Lys Pro His lie Ala Pro His Lys Phe 

20 25 30 

Glu Gly Met Val Glu Thr Ala Arg Asp Leu lie Arg Asn Gly Asp Met 

35 40 45 

Phe Arg Cys Val Thr Gin Pro Ala Phe Ser Ser Arg Arg Ser 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 13 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:133: 

Val Ser Ser Ser Phe Phe Thr Pro Leu Lys Gin Leu Ser Lys Phe Leu 

15 10 15 

lie lie Met Ala Met Ser Ala lie Gly Leu Lys Thr Asn Leu Val Ala 

20 25 30 

Met Val Lys Ser Ser Gly Lys Ser lie Val Leu Gly Ala Val Cys Trp 

35 40 45 

lie Ala lie lie Leu Thr Ser Leu Gly Met Gin Thr Leu lie Gly lie 
50 55 60 

Phe 
65 

(2) INFORMATION FOR SEQ ID NO: 13 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 4: 

Val Pro Glu Asp Tyr Arg lie lie Thr Ser Asp Asp Ser Gin lie Ser 

15 10 15 

Arg Phe Thr Arg Pro Asn Leu Thr Thr lie Ala Gin Pro Leu Tyr Asp 

20 25 30 

Leu Gly Ala lie Ser Met Arg Met Leu Thr Lys lie Met His Lys Glu 

35 40 45 

Glu Leu Glu Glu Arg Glu Val Leu Leu Pro His Gly Leu Thr Glu Arg 

50 55 60 

Ser Ser Thr Arg Lys Arg Lys 
65 70 

(2) INFORMATION FOR SEQ ID NO: 13 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 163 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 5: 

Val Gly Gin Ser Gin Phe Leu Phe Lys Val Ser Tyr Ala Asp Gly Gin 

15 10 15 

Lys Ala Tyr Arg Val Asp Leu Pro Asp Leu Leu Thr Lys Thr Asp Trp 

20 25 30 

Gin lie lie Lys Ser Phe Leu Asp Val Leu Leu Ala Tyr Thr Gly Thr 

35 40 45 

Asp lie Glu Gly Leu Asp Gly Phe Asp Phe Glu Ala Tyr Phe Gin Ala 

50 55 60 

Ser lie Gin Ala Tyr Leu Ala Asp Pro Val Ala Arg Phe Thr lie Cys 
65 70 75 80 

Gin Arg lie Phe Asn Pro lie Phe Phe Ser Arg Glu Asn Leu Lys Ser 

85 90 95 

Phe Leu Glu Ala Asp Gly Leu Ala Gin Phe Glu Ala Arg Val Arg Ala 

100 105 110 

Val Gin Glu Thr Asp Ala Tyr Phe Ala Arg Val Ser Phe Tyr Gin Asp 
115 120 125 
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Gly Glu Gly Lys 
130 

Thr Val Leu Pro 
145 

lie Gly Gly 



Val His Gly Val 
135 

Arg Glu Pro Phe 
150 



Tyr His Leu Ala 
140 

Val Pro Ala Ala 
155 



Gin Gly Val Lys 

Tyr lie Glu Arg 
160 



(2) INFORMATION FOR SEQ ID NO: 13 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 6: 

Val Asp Lys Glu Val Gin Trp Glu lie Asp Leu Val Gin lie Thr Gly 

15 10 15 

Asp Gly Ser Lys Pro Glu Asp Tyr Glu Ser lie Ala Arg Leu Asp Tyr 

20 25 30 

Ala Lys Phe Leu Glu Val Leu Pro Pro Ser Phe Tyr His Gin Leu Asp 

35 40 45 

Ala Asn Gin lie Glu lie Gin Pro lie Leu Gly Gin Asp Phe Lys Thr 

50 55 60 

Leu Ala Gin Glu Lys 
65 

(2) INFORMATION FOR SEQ ID NO: 13 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 299 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 
Val lie Leu Lys lie Glu Asp Leu Val Met Ser lie lie Ser Thr Asp 
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15 10 15 

Leu Thr Pro Phe Gin lie Asp Asp Thr Leu Lys Ala Ala Leu Arg Glu 

20 25 30 

Asp Val His Ser Glu Asp Tyr Ser Thr Asn Ala lie Phe Asp His His 

35 40 45 

Gly Gin Ala Lys Val Ser Leu Phe Ala Lys Glu Ala Gly Val Leu Ala 

50 55 60 

Gly Leu Thr Val Phe Gin Arg Val Phe Thr Leu Phe Asp Ala Glu Val 
65 70 75 80 

Thr Phe Gin Asn Pro His Gin Phe Lys Asp Gly Asp Arg Leu Thr Ser 

85 90 95 

Gly Asp Leu Val Leu Glu lie lie Gly Ser Val Arg Ser Leu Leu Thr 

100 105 110 

Cys Glu Arg Val Ala Leu Asn Phe Leu Gin His Leu Ser Gly lie Ala 

115 120 125 

Ser Met Thr Ala Ala Tyr Val Glu Ala Leu Gly Asp Asp Cys He Lys 

130 135 140 

Val Phe Asp Thr Arg Lys Thr Thr Pro Asn Leu Arg Leu Phe Glu Lys 
145 150 155 160 

Tyr Ala Val Arg Val Gly Gly Gly Tyr Asn His Arg Phe Asn Leu Ser 

165 170 175 

Asp Ala He Leu Leu Lys Asp Asn His lie Ala Ala Val Gly Ser Val 

180 185 190 

Gin Arg Ala lie Ala Gin Ala Arg Ala Tyr Ala Pro Phe Val Lys Met 

195 200 205 

Val Glu Val Glu Val Glu Ser Leu Ala Ala Ala Glu Glu Ala Ala Ala 

210 215 220 

Ala Gly Ala Asp lie He Met Leu Asp Asn Met Ser Leu Glu Gin He 
225 230 235 240 

Glu Gin Ala He Thr Leu He Ala Gly Arg Ser Arg He Glu Cys Ser 

245 250 255 

Gly Asn He Asp Met Thr Thr He Ser Arg Phe Arg Gly Leu Ala He 

260 265 270 

Asp Tyr Val Ser Ser Gly Ser Leu Thr His Ser Ala Lys Ser Leu Asp 

275 280 285 

Phe Ser Met Lys Gly Leu Thr Tyr Leu Asp Val 
290 295 

(2) INFORMATION FOR SEQ ID NO: 13 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 amino acids - 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 8: 

Val Glu Val Glu Val Pro Thr Gin Val Pro Ala His lie .Gly lie lie 

1 5 10 15 

Met Asp Gly Asn Gly Arg Trp Ala Lys Lys Arg Met Gin Pro Arg Val 

20 25 30 

Phe Gly His Lys Ala Gly Met Glu Ala Leu Gin Thr Val Thr Lys Ala 

35 40 45 

Ala Asn Lys Leu Gly Val Lys Val lie Thr Val Tyr Ala Phe Ser Thr 

50 55 60 

Glu Asn Trp Thr Arg Pro Asp Gin Glu Val Lys Phe lie Met Asn Leu 
65 70 75 80 

Pro Val Glu Phe Tyr Asp Asn Tyr Val Pro Glu Leu His Ala Asn Asn 

85 90 95 

Val Lys lie Gin Met lie Gly Glu Thr Asp Arg Leu Pro Lys Gin Thr 

100 105 110 

Phe Glu Ala Leu Thr Lys Ala Glu Glu Leu Thr Lys Asn Asn Thr Gly 

115 120 125 

Leu lie Leu Asn Phe Ala Leu Asn Tyr Gly Gly Arg Ala Glu He Thr 

130 135 140 

Gin Ala Leu Lys Leu He Ser Gin Asp Val Leu Asp Ala Lys He Asn 
145 150 155 160 

Pro Gly Asp lie Thr Glu Glu Leu lie Gly Asn Tyr Leu Phe Thr Gin 

165 170 175 

His Leu Pro Lys Asp Leu Arg Asp Pro Asp Leu He He Arg Thr Ser 

180 185 190 

Gly Glu Leu Arg Leu Ser Asn Phe Leu Pro Trp Gin Gly Ala Tyr Ser 

195 200 205 

Glu Leu Tyr Phe Thr Asp Thr Leu Trp Pro Asp Phe Asp Glu Ala Ala 

210 215 220 

Leu Gin Glu Ala He Leu Ala Tyr Asn Arg Arg His Arg Arg Phe Gly 
225 230 235 240 

Gly Val 



(2) INFORMATION FOR SEQ ID NO: 13 9: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 



Val Val Ala Tyr Ser Val Leu lie Ser lie Met Leu Gly Thr Thr Val 

15 10 15 

Phe Ser Lys Ser Tyr Thr lie Glu Asp Ala Val Phe Pro Leu Ala Met 

20 25 30 

Ser Phe Tyr Val Gly Phe Gly Phe Asn Ala Leu Leu Asp Ala Arg Val 

35 40 45 

Ala Gly Leu Asp Lys Ala Leu Leu Ala Leu Cys lie Val Trp Ala Thr 

50 55 60 

Asp Ser Gly Ala Tyr Leu Val Gly Met Asn Tyr Gly Lys Arg Lys Leu 
65 70 75 80 

Ala Pro Arg Val Ser Pro Asn Lys Thr Leu Glu Gly Ala Leu Gly Gly 

85 90 95 

lie Leu Gly Ala lie Leu Val Thr lie lie Phe Met lie Val Asp Ser 

100 105 110 

Thr Val Ala Leu Pro Tyr Gly lie Tyr Lys Met Ser Val Phe Ala lie 

115 120 125 

Phe Phe Ser lie Ala Gly Gin Phe Gly Asp Leu Leu Glu Ser Ser lie 

130 135 140 

Lys Arg His Phe Gly Val Lys Asp Ser Gly Lys Phe lie Pro Gly His 
145 150 155 160 

Gly Gly Val Leu Asp Arg Phe Asp Ser Met Leu Leu Val Phe Pro lie 

165 170 175 

Met His Leu Phe Gly Leu Phe 
180 



(2) INFORMATION FOR SEQ ID NO: 140: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 

Val Asp Leu Leu Leu Ser Leu Arg Gin Val Val Met Leu Leu Lys Met 

15 10 15 

Glu Leu Arg lie Phe Leu Tyr Phe Leu Ala Met lie Ser lie Asn lie 

20 25 30 

Gly lie Phe Asn Leu lie Pro lie Pro Ala Leu Asp Gly Gly Lys lie 

35 40 45 

Val Leu Asn lie Leu Glu Ala lie Arg Arg Lys Pro Leu Lys Gin Glu 

50 55 60 

lie Glu Thr Tyr Val Thr Leu Ala Gly Val Val lie Met Val Val Leu 
65 70 75 80 

Met lie Ala Val Thr Trp Asn Asp lie Met Arg Leu Phe Phe Arg 
85 90 95 

(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 

Val Glu Leu Met Ser Thr Val Gin Lys Ser Thr Phe Met Lys Cys Val 

15 10 15 

Asn Thr Leu Glu Trp Phe Phe Asn Ala Pro lie His Leu Leu Asn Arg 

20 25 30 

lie Tyr Arg Asn lie Thr Phe Ala His Glu Arg Ala Gly Val Lys Asp 

35 40 45 

Lys Gin Val Leu Asp Glu lie Val Glu Thr Ser Leu Ser Gin Ala Ala 

50 55 60 

Leu Trp Asp Gin Val Lys Asp Asp Leu His Lys Ser Ala Leu Thr Leu 
65 70 75 80 

Ser Gly Gly Gin Gin Gin Arg Leu Cys lie Ala Arg Ala lie Ser Val 

85 90 95 

Lys Pro Asp lie Leu Leu Met Asp Glu Pro Ala Ser Ala Leu Asp Pro 

100 105- 110 

lie Ala Thr Met Gin Leu Glu Glu Thr Met Phe Glu Leu Lys Lys Asn 
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115 120 125 

Phe Thr lie lie lie Val Thr His Asn Met Gin Gin Ala Ala Arg Ala 

130 135 140 

Ser Asp Tyr Thr Gly Phe Phe Tyr Leu Gly Asp Leu lie Glu Tyr Asp 
145 150 155 160 

Lys Thr Ala Thr lie Phe Gin Asn Ala Lys Leu Gin Ser Thr Asn Asp 

165 170 175 

Tyr Val Ser Gly His Phe Gly 
180 

(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

Val Pro Lys Glu Ser Leu Thr Gin Val Leu Pro Arg Asp Leu His Ala 

15 10 15 

Glu Tyr Phe Ala Val Leu Ala Ser lie Ala Thr Ser lie Glu Arg Met 

20 25 30 

Ala Thr Glu lie Arg Gly Leu Gin Lys Ser Glu Gin Arg Glu Val Glu 

35 40 45 

Glu Phe Phe Ala Lys Gly Gin Lys Gly Ser Ser Ala Met Pro His Lys 

50 55 60 

Arg Asn Pro lie Gly Ser Glu Asn Met Thr Gly Leu Ala Arg Val lie 
65 70 75 80 

Arg Gly His Met lie Thr Ala Tyr Glu Asn Val Ala Leu Trp His Glu 

85 90 95 

Arg Asp lie Ser His Ser Ser Ala Glu Arg lie lie Thr Pro Asp Thr 

100 105 110 

Thr lie Leu lie Asp Tyr Met Leu Asn Arg Phe Gly Asn lie Val Lys 

115 120 125 

Asn Leu Thr Val Phe Pro Glu Asn Met lie Arg Asn Met Asn Ser Thr 

130 135 140 

Phe Gly Leu lie Phe Ser Gin Arg Ala Met Leu Thr Leu lie Glu Lys 
145 150 155 160 

Gly Met Thr Arg Glu Gin Ala Tyr Asp Leu Val Gin Pro Lys Thr Ala 
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165 170 175 

Tyr Ser Trp Asp Asn Gin Val Asp Phe Lys Pro Leu Leu Glu Ala Asp 

180 185 190 

Ser Glu Val Thr Ser Arg Leu Thr Gin Glu Glu lie Asp Glu lie Phe 

195 200 205 

Asn Pro Val Tyr Tyr Thr Lys Arg Val Asp Asp lie Phe Glu Arg Leu 

210 215 220 

Gly Leu Gly Asp 
225 

(2) INFORMATION FOR SEQ ID NO: 143: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:143: 



Val lie Phe lie 
1 

Trp Phe Ser Leu 
20 

Gly Pro Thr Asp 
35 

Phe Pro Lys Arg 
50 

Asp Ala Ser Gly 
65 

Thr Gly Ala Phe 

lie Leu Gly Gly 
100 

Phe Leu His Thr 
115 

Glu Leu Leu Leu 
130 

Arg Arg Ser Pro 
145 



Ser Thr Leu Ser 
5 

Pro Leu Ala Ala 

Leu Val Ala Phe 
40 

Val Ser Asn lie 
55 

Leu Val Ala Phe 
70 

Ser Leu Gly Gin 
85 

Phe Leu lie Gly 

Phe Leu Leu Ser 
120 

Glu Phe Glu Phe 
135 

Cys Phe Arg Asp 
150 



Leu Gly Gly Leu 
10 

Cys Leu Ala Val 
25 

Ala Ser Leu Ser 

Leu Lys Gly Glu 
60 

Gin Val Ala Leu 
75 

Ala Ser Ser Ser 
90 

Phe Leu Thr Ala 
105 

Val Arg Ala Thr 

Ala Ser Ser Asp 
140 

Tyr Cys Arg Arg 
155 



Ala His Leu Leu 

15 

Gly Ala Ala Leu 
30 

Glu Arg Phe Ser 
45 

Gly Leu Leu Asn 

Thr Ala Trp Thr 
80 

Leu lie Phe Ser 
95 

Met Thr Asn Arg 
110 

Asp lie Ala Ser 
125 

Leu Leu Ser Gly 
Ser 
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(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 

Val Thr Phe Phe Leu Ala Glu Glu Val His Val Ser Gly lie lie Ala 

15 10 15 

Val Val Val Asp Arg lie Leu Lys Ala Ser Arg Phe Lys Lys lie Thr 

20 25 30 

Leu Leu Glu Ala Gin Val Asp Thr Val Thr Glu Thr Val Trp His Thr 

35 40 45 

Val Thr Phe Met Leu Asn Gly Ser Val Phe Val lie Leu Gly Met Glu 

50 55 60 

Leu Glu Met lie Ala Glu Pro lie Leu Thr Asn Pro lie Tyr Asn Pro 
65 70 75 80 

Leu Leu Leu Leu Leu Ser Leu lie Ala Leu Thr Phe Val Leu Phe Val 

85 90 95 

lie Arg Phe lie Met lie Tyr Gly Tyr Tyr Ala Tyr Arg Thr Arg Arg 

100 105 110 

Leu Lys Lys Lys Leu Asn Lys Tyr Met Lys Asp Met Phe Leu Leu Thr 

115 120 125 

Phe Ser Gly Val Lys Gly Thr Val Ser lie Ala Thr lie Leu Leu lie 

130 135 140 

Pro Ser Asn Leu Glu Gin Glu Tyr Pro Leu Leu Leu Phe Leu Val Ala 
145 150 155 160 

Gly Val Thr Leu Val Ser Phe Leu Thr Gly Leu Leu Val Leu Pro His 

165 170 175 

Leu Ser Asp Glu Glu Glu Glu Ser Lys Asp Tyr Leu Met His lie Ala 

180 185 190 

lie Leu Asn Glu Val Thr Leu Glu Leu Glu Lys Glu Leu Glu Asp Thr 

195 200 205 

Arg Asn Lys Leu Pro Leu Tyr Ala Ala lie Asp Asn Ser lie Met Asp 

210 215 220 

Val Leu Lys lie Ser Phe 
225 230 
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(2) INFORMATION FOR SEQ ID NO: 145: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 

Val Thr Gly Glu Val Gly Asp Leu Lys Gin Gly Phe Ser Val Asn lie 

15 10 15 

Glu Val Lys Ser Lys Thr Lys Ala lie Leu Val Pro Val Ser Ser Leu 

20 25 30 

Val Met Asp Asp Ser Lys Asn Tyr Val Trp lie Val Asp Glu Gin Gin 

35 40 45 

Lys Ala Lys Lys Val Glu Val Ser Leu Gly Asn Ala Asp Ala Glu Asn 

50 55 60 

Gin Glu lie Thr Ser Gly Leu Thr Asn Gly Ala Lys Val lie Ser Asn 
65 70 75 80 

Pro Thr Ser Ser Leu Glu Glu Gly Lys Glu Val Lys Ala Asp Glu Ala 
85 90 95 

Thr Asn 



(2) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 182 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 

Val Gly Leu Gin lie Arg Ala lie Phe Lys Arg Tyr Thr Asp Leu lie 

15 10 15 

Glu Pro Met Ser lie Asp Glu Ala Tyr Leu Asp Val Thr Glu Asn Lys 
20 25 30 
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Leu Gly lie Lys Ser Ala Val Lys lie Ala Arg Leu lie Gin Lys Asp 

35 40 45 

lie Trp Gin Glu Leu His Leu Thr Ala Ser Ala Gly Val Ser Tyr Asn 

50 55 60 

Lys Phe Leu Ala Lys Met Ala Ser Asp Tyr Gin Lys Pro His Gly Leu 
65 70 75 80 

Thr Val lie Leu Pro Glu Gin Ala Glu Asp Phe Leu Lys Gin Met Asp 

85 90 95 

lie Ser Lys Phe His Gly Val Gly Lys Lys Thr Val Glu Arg Leu His 

100 105 110 

Gin Met Gly Val Phe Thr Gly Ala Asp Leu Leu Glu Val Pro Glu Val 

115 120 125 

Thr Leu lie Asp Arg Phe Gly Arg Leu Gly Tyr Asp Leu Tyr Arg Lys 

130 135 140 

Ala Arg Gly lie His Asn Ser Pro Val Lys Ser Asn His lie Arg Lys 
145 150 155 160 

Ser lie Gly Lys Glu Lys Thr Tyr Gly Lys lie Leu Arg Ala Glu Glu 

165 170 175 

Asp lie Lys Lys Glu Ser 
180 



(2) INFORMATION FOR SEQ ID NO: 147: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 43 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 



Val Asn Leu Pro 
1 

Gin Ala Glu Ala 
20 

Met Asn lie Ala 
35 

Asn Asn Thr Arg 
50 

Asn lie Asp Tyr 
65 



Lys Arg Ala Phe 
5 

Val Met Asp lie 

Val Lys Gin Leu 
40 

Gin Glu lie Leu 
55 

Pro Glu Tyr Asp 
70 



Leu Asn Gly Arg 
10 

lie Arg Ala Lys 
25 

Asp Gly Ser Leu 

Asn Thr Leu Ala 
60 

Asp Val Glu Glu 
75 
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Val Asp Leu Thr 
15 

Thr Asp Lys Ala 
30 

Ser Asp Leu lie 
45 

Gin Val Glu Val 

Ala Thr Thr Ala 
80 
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Val Val Arg Glu Lys Thr Met Glu Phe Glu Gin Leu Leu Thr Lys Leu 

85 90 95 

Leu Arg Thr Ala Arg Arg Gly Lys lie Leu Arg Glu Gly lie Ser Thr 

100 105 110 

Ala lie lie Gly Arg Pro Asn Val Gly Lys Ser Ser Leu Leu Asn Asn 

115 120 125 

Leu Leu Arg Glu Asp Lys Ala lie Val Thr Asp lie Ala Gly Thr Thr 

130 135 140 

Arg Asp Val lie Glu Glu Tyr Val Asn lie Asn Gly Val Pro Leu Lys 
145 150 155 160 

Leu He Asp Thr Ala Gly He Arg Glu Thr Asp Asp He Val Glu Gin 

165 170 175 

He Gly Val Glu Arg Ser Lys Lys Ala Leu Lys Glu Ala Asp Leu Val 

180 185 190 

Leu Leu Val Leu Asn Ala Ser Glu Pro Leu Thr Ala Gin Asp Arg Gin 

195 200 205 

Leu Leu Glu He Ser Gin Asp Thr Asn Arg He He Leu Leu Asn Lys 

210 215 220 

Thr Asp Leu Pro Glu Thr He Glu Thr Ser Lys Leu Pro Glu Asp Val 
225 230 235 240 

He Arg He Ser Val Leu Lys Asn Gin Asn He Asp Lys He Glu Glu 

245 250 255 

Arg He Asn Asn Leu Phe Phe Glu Asn Ala Gly Leu Val Glu Gin Asp 

260 265 270 

Ala Thr Tyr Leu Ser Asn Ala Arg His He Ser Leu He Glu Lys Ala 

275 280 285 

Val Glu Ser Leu Gin Ala Val Asn Gin Gly Leu Glu Leu Gly Met Pro 

290 295 300 

Val Asp Leu Leu Gin Val Asp Leu Thr Arg Thr Trp Glu He Leu Gly 
305 310 315 320 

Glu He Thr Gly Asp Ala Ala Pro Asp Glu Leu He Thr Gin Leu Phe 

325 330 335 

Ser Gin Phe Cys Leu Gly Lys 
340 



(2) INFORMATION FOR SEQ ID NO: 148: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 

Val Glu lie Ser Val Gin Pro Pro Gly Lys Lys lie Gin Ser Leu Asn 

15 10 15 

Leu Met Ser Gly Gly Glu Lys Ala Leu Ser Ala Leu Ala Leu Leu Phe 

20 25 30 

Ser lie lie Arg Val Lys Thr lie Pro Phe Val lie Leu Asp Glu Val 

35 40 45 

Glu Ala Ala Leu Asp Glu Ala Asn Val Lys Arg Phe Gly Asp Tyr Leu 

50 55 60 

Asn Arg Phe Asp Lys Asp Ser Gin Phe He Val Val Thr His Arg Lys 
65 70 75 80 

Gly Thr Met Ala Ala Ala Asp Ser He Tyr Gly Val Thr Met Gin Glu 

85 90 95 

Ser Gly Val Ser Lys lie Val Ser Val Lys Leu Lys Asp Leu Glu Ser 
100 105 HO 

lie Glu Gly 
115 

(2) INFORMATION FOR SEQ ID NO: 149: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 231 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: 

Val Thr Thr Val Ala Glu Phe Gly Asp Ser Ser Lys Leu Thr Val Gly 

1 5 10 15 

Glu Thr Ala He Ala He Gly Ser Pro Leu Gly Ser Glu Tyr Ala Asn 

20 25 30 

Thr Val Thr Gin Gly He Val Ser Ser Leu Asn Arg Asn Val Ser Leu 

35 40 45 

Lys Ser Glu Asp Gly Gin Ala He Ser Thr Lys Ala He Gin Thr Asp 

50 55 60 

Thr Ala He Asn Pro Gly Asn Ser Gly Gly Pro Leu He Asn He Gin 
65 70 75 80 
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Gly Gin Val lie 

Thr Ser Val Glu 
100 

Asn lie lie Glu 
115 

Leu Gly lie Gin 
130 

Arg Arg Leu Asn 
145 

Ser Val Gin Ser 

Val lie Thr Lys 
180 

Gin Ser Ala Leu 
195 

Tyr Tyr Arg Asn 
210 

Ser Ser Gly Asp 
225 



Gly lie Thr Ser 
85 

Gly Leu Gly Phe 

Gin Leu Glu Lys 
120 

Met Val Asn Leu 
135 

lie Pro Ser Asn 
150 

Asn Met Pro Ala 
165 

Val Asp Asp Lys 

Tyr Asn His Ser 
200 

Gly Lys Glu Glu 
215 

Leu Glu Ser 
230 



Ser Lys lie Ala 
90 

Ala lie Pro Ala 
105 

Asn Gly Lys Val 

Ser Asn Val Ser 
140 

Val Thr Ser Gly 
155 

Asn Gly His Leu 
170 

Glu lie Ala Ser 
185 

lie Gly Asp Thr 

Thr Thr Ser lie 
220 



Thr Asn Gly Gly 
95 

Asn Asp Ala lie 
110 

Thr Arg Pro Ala 
125 

Thr Ser Asp lie 

Val lie Val Arg 
160 

Glu Lys Tyr Asp 
175 

Ser Thr Asp Leu 
190 

lie Lys lie Thr 
205 

Lys Leu Asn Lys 



(2) INFORMATION FOR SEQ ID NO: 15 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 0: 



Val Gin Arg Ser Met Leu Leu Pro 

1 5 
Trp Leu lie Tyr Leu Leu Leu Lys 
20 

Val Asn Gin Ser Leu Lys Arg Ser 
35 40 



Gly Gly lie Leu Gly Met Thr Val 

10 15 
Glu Pro Thr Asn Val lie Val Ala 
25 30 



(2) INFORMATION FOR SEQ ID NO: 151: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 102 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 



Val Thr Met Glu 
1 

Ala Asn Lys Ser 
20 

Ala Gly Arg Ser 
35 

Asn Arg Lys Asn 
50 

Leu Leu Asn Phe 
65 

Pro Gly Tyr Gly 

Gly Cys Met lie 
100 



Leu Asn Thr His 
5 

His Tyr Pro Gin 

Asn Val Gly Lys 
40 

Leu Ala Arg Thr 
55 

Phe Asn lie Asp 
70 

Tyr Ala Arg Val 
85 

Glu Glu 



Asn Ala Glu lie 
10 

Asp Glu Leu Pro 
25 

Ser Ser Phe lie 

Ser Gly Lys Pro 
60 

Asp Lys Met Arg 
75 

Ser Lys Lys Glu 
90 



Leu Leu Ser Ala 
15 

Glu lie Ala Leu 
30 

Asn Thr Met Leu 
45 

Gly Lys Thr Gin 

Phe Val Asp Val 
80 

Arg Glu Lys Trp 
95 



(2) INFORMATION FOR SEQ ID NO: 152: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:152: 



Val Gin Met Tyr 
1 

Val Ala Thr Lys 
20 

Glu Ser Ala lie 
35 

lie Leu Phe Ser 



Glu Phe Leu Lys 
5 

Ala Asp Lys lie 

Lys Lys Lys Leu 
40 

Ser Val Ser Lys 



Tyr Tyr Glu lie 
10 

Pro Arg Gly Lys 
25 

Asn Phe Asp Pro 
Ala Gly Met Asp 

240 



Pro Val lie lie 
15 

Trp Asn Lys His 
30 

Ser Asp Asp Phe 
45 

Glu Ala Trp Asp 
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50 55 60 

Ala lie Leu Glu Lys Leu 
65 70 

(2) INFORMATION FOR SEQ ID NO: 15 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:153: 

Val Phe Met Val Tyr Asn Cys Pro Lys Pro Val Tyr Ser Phe Leu Lys 

15 10 15 

Ser Ala lie Asn Leu Met Ala Ala lie Pro Ser lie Val Tyr Gly Phe 

20 25 30 

Phe Gly Leu Gin Leu Leu Val Pro Trp lie Lys Thr Phe Leu Gly Asn 

35 40- 45 

Gly Met Ser Cys Pro Asn Gin Leu Arg Tyr Tyr 
50 55 

(2) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 

Val lie lie Met Lys Phe Lys Lys Met Leu Thr Leu Ala Ala lie Gly 

15 10 15 

Leu Ser Gly Phe Gly Leu Val Ala Cys Gly Asn Gin Ser Ala Ala Ser 

20 25 30 

Lys Gin Ser Ala Pro Gly Thr lie Glu -Val He Ser Arg Glu Asn Gly 
35 40 45 
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Ser Gly Thr Arg Gly Ala Phe Thr Glu lie Thr Gly He Leu Lys Lys 

50 55 60 

Asp Gly Asp Lys Lys He Asp Tyr Thr Ala Lys Thr Ala Val lie Gin 
65 70 75 80 

Asn Ser Thr Glu Gly Val Leu Ser Ala Val Gin Gly Asn Ala Asn Ala 

85 90 95 

He Gly Tyr He Ser Leu Gly Ser Leu Thr Lys Ser Val Lys Ala Leu 

100 105 110 

Glu He Asp Gly Val Lys Ala Ser Arg Asp Thr Val Leu Asp Gly Glu 

115 120 125 

Tyr Pro Leu Gin Arg Pro Phe Asn He Val Trp Ser Ser Asn Leu Ser 

130 135 140 

Lys Leu Gly Gin Asp Phe He Ser Phe He His Ser Lys Gin Gly Gin 
145 150 155 160 

Gin Val Val Thr Asp Asn Lys Phe He Glu Ala Lys Thr Glu Thr Thr 

165 170 175 

Glu Tyr Thr Ser Gin His Leu Ser Gly Lys Leu Ser Val Val Gly Ser 

180 185 190 

Thr Ser Val Ser Ser Leu Met Glu Lys Leu Ala Glu Ala Tyr Lys Lys 

195 200 205 

Glu Asn Pro Glu Val Thr He Asp He Thr Ser Asn Gly Ser Ser Ala 

210 215 220 

Gly He Thr Ala Val Lys Glu Lys Thr Ala Asp He Gly Met Val Ser 
225 230 235 240 

Arg Glu Leu Thr Pro Glu Glu Gly Lys Ser Leu Thr His Asp Ala He 

245 250 255 

Ala Leu Asp Gly He Ala Val Val Val Asn Asn Asp Asn Lys Ala Ser 

260 265 270 

Gin Val Ser Met Ala Glu Leu Ala Asp Val Phe Ser Gly Lys Leu Thr 

275 280 285 

Thr Trp Asp Lys He Lys 
290 

(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

242 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 5: 



Val Ser Ser lie 
1 

Ala Gin Leu Lys 
20 

Tyr Glu Ser Pro 
35 

Asp Asn Thr Val 
50 

Lys Tyr Asp Leu 
65 

Arg Arg Val Val 

Lys Glu Val Ala 
100 

Val Phe Pro Tyr 
115 

Leu Lys Val Glu 
130 

Asn Arg Val Val 
145 



Leu Gly Ala Gly 

5 

lie Leu Glu Leu 

Val Gly Phe Arg 
40 

Val Leu Val Phe 
55 

Asp Leu Val Arg 
70 

Leu Leu Ser Asp 
85 

Leu Gly Cys Gly 

lie Val Tyr Ala 
120 

Asn Lys Pro Asp 
135 

Gin Gly Val lie 
150 



Pro Phe Phe Gly 
10 

Thr Ala Gly Gin 
25 

His Gly Pro Lys 

Gly Thr Thr Thr 
60 

Glu Val Ala Gly 
75 

Gin Ala Phe Gly 
90 

Gly Val Leu Asn 
105 

Gin Leu Phe Ala 

Thr Pro Ser Pro 
140 

lie His Glu Tyr 
155 



Leu Ala His Glu 
15 

Val Ala Thr Met 
30 

Ser Leu lie Asn 
45 

Asp Tyr Thr Arg 

Asp Gin lie Ala 
80 

Leu Glu Asn Val 
95 

Asp lie Tyr Arg 
110 

Leu Leu Thr Ser 
125 

Thr Gly Thr Val 
Gin Lys 



(2) INFORMATION FOR SEQ ID NO: 156: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 

Val Lys Pro Gly Asp Phe Val lie Val Pro Phe Thr His Gly Cys Gly 

15 10 15 

Glu Cys Asp Ala Cys Leu Ala Gly Phe Asp Gly Ser Cys Asp Asn His 

20 25 30 

lie Gly Asn Asn Leu Gly Gly Asp Phe Gin Ala Glu Tyr lie Arg Phe 

35 40 45 

His Tyr Ala Asn Trp Ala Leu Val Lys He Pro Gly Gin Pro Ser Asp 
50 55 60 

243 



SUBSTITUTE SHEET (RULE 26) 



WO 98/19689 



PCT/US97/19226 



Tyr Thr Glu Gly Met Leu Lys Ser Leu Leu Thr Leu Ala Asp Val Met 
65 70 75 80 

Pro Thr Gly Tyr His Ala Ala Arg Val Ala Asn Val Gin Lys Gly Asp 

85 90 95 

Lys Val Val Val lie Gly Asp Gly Ala Val Gly Gin Cys Ala Val lie 

100 105 110 

Ala Ala Lys Met Arg Gly Ala Ser Gin lie lie Leu Met Ser Arg His 

115 120 125 

Glu Asp Arg Gin Lys Met Ala Met Glu Ser Gly Ala Thr Ala Val Val 

130 135 140 

Ala Glu Arg Gly Gin Glu Gly lie Thr Lys Val Arg Glu lie Leu Gly 
145 150 155 160 

Gly Gly Ala Asp Ala Ala Leu Glu Cys Val Gly Thr Glu Ala Ala lie 

165 170 175 

Glu Gin Ala Leu Gly Val Leu His Asn Gly Gly Arg Met Gly Phe Val 

180 185 190 

Gly Val Pro His Tyr Asn Asn Arg Ala Leu Gly Ser Thr Phe Met Gin 

195 200 205 

Asn lie Ser Val Ala Gly Gly Ala Ala Ser Ala Thr Thr Tyr Asp Lys 

210 215 220 

Gin Phe Leu Leu Lys Ala Val Leu Asp Gly Asp lie Asn Pro Gly Arg 
225 230 235 240 

Val Phe Thr Ser Ser Tyr Lys Leu Glu Asp lie Asp Gin Ala Tyr Lys 

245 250 255 

Asp Met Asp Glu Arg Lys Thr lie Lys Ser Met lie Val lie Glu 
260 265 270 



(2) INFORMATION FOR SEQ ID NO: 157: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 122 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 



Val Arg Lys Ser Arg Val Asn Asn 

1 5 
Glu Glu Gin Asp Leu Thr Lys Ala 



Ser Gin Gin Met Leu Gin Ala Leu 

10 15 

Glu -His Tyr Phe Ala Lys Ala Leu 

25 30 
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Glu Asn Asp Ser Ser 
35 

Gly lie Gly Phe Tyr 
50 

Glu Glu Phe Pro Glu 
65 

Asp Gly Gin lie Glu 
85 

Asp Ser Asp Trp Tyr 
100 

Pro Ala Gly Arg Phe 
115 



Asp Leu Leu Tyr Glu 
40 

Pro Gin Ala Lys Glu 
55 

Val His Leu Asn Leu 
70 

Lys Ala Phe Asn Tyr 
90 

Val Ser Leu Phe Gly 
105 

Asp Arg Cys Gly Thr 
120 



Leu Ala Thr Tyr Leu Glu 
45 

lie Tyr Leu Lys lie Val 
60 

Ala Ala Met Ala Ser Glu 
75 80 
Leu Glu Glu lie Gin Ala 
95 

Ser Glu Gly Arg Pro lie 
110 



(2) INFORMATION FOR SEQ ID NO; 15 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: 

Val Thr Gly Met Ser Arg Ser Leu Ala Leu Lys Ala Asp Leu Tyr Gin 

15 10 15 

Leu Glu Gly Leu Thr Asp Val Ala Arg Glu Lys Leu Leu Glu Ala Leu 

20 25 30 

Thr Tyr Ser Lys Asp Ser Leu Leu lie Leu Gly Leu Ala Lys Leu Asp 

35 40 45 

Ser Glu Leu Glu Asn Tyr Gin Ala Ala lie Gin Ala Tyr Ala Gin Leu 

50 55 60 

Asp Asn Arg Ser lie Tyr Glu Gin Thr Gly lie Ser Thr Tyr Gin Arg 
65 70 75 80 

lie Gly Phe Ala Tyr Ala Gin Leu Gly Lys Phe Glu Thr Ala Thr Glu 

85 90 95 

Phe Leu Glu Lys Ala Leu Glu Leu Glu Tyr Asp Asp Leu Thr Ala Phe 

100 105 110 

Glu Leu Ala Ser Leu Tyr Phe Asp Gin Glu Glu Tyr Gin Lys Ala Thr 

115 120 125 

Leu Tyr Phe Lys Gin Leu Asp Thr lie -Ser Pro Asp Phe Glu Gly Tyr 
130 135 140 
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Glu Tyr Gly Tyr Ser 
145 

Ala Leu Arg lie Ala 
165 

Arg Leu Leu Leu Ala 
180 

Ser Gly Ala Glu Asn 
195 

Thr Glu Glu lie Leu 

210 

Arg Tyr Glu Asp lie 
225 

Leu Thr Lys Trp Met 
245 

Asp Thr Ala Tyr Glu 
260 

Asn Pro Glu Phe Leu 
275 

His Phe Glu Glu Ala 
290 

Pro Asp Asp Val Gin 
305 



Gin Ala Leu His Lys 
150 

Lys Gin Gly Leu Glu 
170 

Ala Ser Gin Phe Ser 
185 

Tyr Leu Leu Thr Ala 
200 

Leu Arg Leu Ala Thr 
215 

Leu Asp Leu Gin Ser 
230 

lie Ala Arg Ser Tyr 
250 

His Tyr Gin Glu Leu 

265 

Glu His Tyr lie Tyr 
280 

Lys Val His Ala His 
295 

Met Gin Glu Leu Phe 
310 



Glu His Gin Val Gin Glu 
155 160 
Lys Asn Pro Phe Glu Thr 
175 

Tyr Glu Leu His Asp Ala 
190 

Lys Glu Asp Ala Glu Asp 
205 

lie Tyr Leu Glu Gin Glu 
220 

Glu Glu Pro Glu Asn Leu 
235 240 
Gin Glu Met Asp Asp Leu 
255 

Thr Gly Asp Leu Lys Asp 
270 

Leu Leu Arg Glu Leu Gly 
285 

Thr Tyr Leu Lys Leu Val 

300 
Glu Arg Leu 
315 



(2) INFORMATION FOR SEQ ID NO: 159: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: 



Val Glu Lys Ala Gly Val Val lie 

1 5 
Trp Glu Thr lie Asp Gly Lys Gly 
20 

Val Gly Asp Asp Thr Glu Ala Ala 

35 40 
Leu Phe Ala Arg Lys Leu Gly Asn 
50 55 



Ala lie Asn His Asn Glu lie Pro 

10 15 
Val Lys Val lie Val Leu Phe Ala 
25 30 
Arg Glu His Leu Lys Thr Leu Ser 
45 

Asp -Glu Val Val Ala Lys Leu Val 
60 
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Arg Ala Gin Thr Ser Asp Asp Val lie Ala Ala Phe Cys 
65 70 75 

<2) INFORMATION FOR SEQ ID NO: 16 0: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: 



Val Ser Asp Phe His Asp Phe Ser 

1 5 
Pro Glu Glu Phe Lys Asn Tyr Pro 
20 

Trp Gin Ala Tyr Ala Gin Ala Asn 
35 40 



Asp Arg Glu Val Arg Trp Leu Ser 

10 15 
Leu Ala Lys Pro Gin Gin Lys lie 
25 30 
Leu Asp Ser Ser Gin Asp 
45 



(2) INFORMATION FOR SEQ ID NO: 161: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: 



Val Asn Phe Glu Lys Lys Ala Gin 

1 5 

Asn Gly Trp Asp Lys Leu Pro lie 
20 

Phe Ser Asp Asn Pro Asn Ala Leu 

35 40 

Thr lie Arg Glu Leu Val Pro Lys 

50 55 

Leu Thr Gly Asp Val Met Thr Met 



Thr Gin lie Ala Gin lie Val Gin 

10 15 

Cys Met Ala Lys Thr Gin Tyr Ser 
25 30 

Gly Ala Pro Glu Asn Phe Glu lie 
45 

Leu Gly Ala Gly Phe lie Val Ala 
60 

Pro Gly Leu Pro Lys Arg Pro Ala 
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65 70 75 80 

Ala Leu Asn Met Asp Val Glu Ser Asp Gly Thr Val Leu Gly Leu Phe 
85 90 95 



(2) INFORMATION FOR SEQ ID NO: 162: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 92 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 



Val Lys Lys Arg 
1 

Thr Ala Cys Leu 
20 

lie Thr Ala Val 
35 

Ala Asp Glu Phe 
50 

Gly Gly Ser Ser 
65 

Asp lie Gly Asn 

Ala Ser Ala Leu 
100 

lie Val Asn Lys 
115 

Arg Gin lie Phe 
130 

Lys Asp Leu Pro 
145 

Arg Ala Thr Phe 

Ser Gin Glu Gin 
180 

Ser Pro Gly Ala 
195 

Val Lys Ser Met 



Lys Lys Leu Ala 
5 

Val Gly Cys Ala 

Gly Ser Thr Ala 
40 

Gly Thr lie His 
55 

Gly Thr Gly Leu 
70 

Ser Asp Val Phe 
85 

Val Asp His Lys 

Glu Val Asp Val 
120 

lie Gly Glu Val 
135 

lie Ser Val lie 
150 

Asp Thr Val lie 
165 

Asp Ser Asn Gly 

lie Ser Tyr Leu 
200 

Lys Leu Asn Gly 



Leu Ser Leu lie 
10 

Ser Trp lie Asp 
25 

Leu Gin Pro Leu 

Val Gly Lys Thr 
60 

Ser Gin Val Gin 
75 

Ala Glu Glu Lys 
90 

Val Ala Val Ala 
105 

Asp Asn Leu Thr 

Thr Asn Trp Lys 
140 

Asn Arg Ala Ala 
155 

Met Glu Gly Gin 
170 

Ala Val Lys Ser 
185 

Ser Leu Thr Tyr 
Tyr Asp Leu Ser 
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Ala Phe Trp Leu 
15 

Arg Gly Glu Ser 
30 

Val Glu Val Ala 
45 

Val Asn Val Gin 

Ser Gly Ala Val 
80 

Asp Gly lie Asp 
95 

Gly Leu Ala Leu 
110 

Thr Glu Gin Leu 
125 

Glu Val Gly Gly 

Gly Ser Gly Ser 
160 

Ser Ala Met Gin 
175 

lie Val Ser Lys 
190 

lie Asp Asp Ser 
205 

Pro Glu Asn lie 
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210 

Ser Ser Asn Asn Trp 
225 

Gly Gin Pro Asn Glu 
245 

Asp Glu Thr Gin Glu 
260 

Lys Glu Met Lys Val 
275 

Glu Gly Arg Gin 
290 



215 

Pro Leu Trp Ser Tyr Glu 
230 235 
Leu Ala Ala Glu Phe Leu 
250 

Gly lie Val Lys Gly Leu 
265 

Glu Lys Asp Ala Ala Gly 
280 



220 

His Met Tyr Thr Leu 
240 

Asn Phe Val Leu Ser 
255 

Lys Tyr lie Pro lie 
270 

Thr Val Thr Val Leu 
285 



(2) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 amino acids 

(B) TYPE: amino acid 

. (C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 3: 

Val Gin Pro Thr Gin Ala Glu Gin Pro Ser Thr Pro Lys Glu Ser Ser 

15 10 15 

Gin Gin Glu Asn Pro Lys Glu Asp Arg Gly Ala Glu Glu Thr Pro Lys 

20 25 30 

Gin Glu Asp Glu Gin Pro Ala Glu Ala Gin Glu lie Lys Val Glu Glu 

35 40 45 

Pro Val Glu Ser lie Glu Glu Thr Val lie Gin Pro Val Glu Gin Pro 

50 55 60 

Lys Val Glu Thr Pro Ala Val 
65 . 70 



(2) INFORMATION FOR SEQ ID NO: 164: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 65 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE 



TYPE: None 



(xi) SEQUENCE 



DESCRIPTION: SEQ ID NO: 164: 



Val Leu Leu Lys Met Asp Gly Tyr Arg Tyr Val Gly Tyr Leu Ser Gly 

15 10 15 

Asp lie Leu Lys Thr Leu Gly Leu Asp Thr Val Leu Glu Glu Thr Ser 

20 25 30 

Ala Lys Pro Gly Glu Val Thr Val Val Glu Val Glu Thr Pro Gin Ser 

35 40 45 

Thr Thr Asn Gin Glu Gin Ala Arg Thr Glu Asn Gin Val Val Glu Thr 

50 55 60 

Glu Glu Ala Pro Lys Glu Glu Ala Pro Lys Thr Glu Glu Ser Pro Lys 
65 70 75 80 

Glu Glu Pro Lys Ser Glu Val Lys Pro Thr Asp Asp Thr Leu Pro Lys 

85 90 95 

Val Glu Glu Gly Lys Glu Asp Ser Ala Glu Pro Ser Pro Val Glu Glu 

100 105 110 

Val Gly Gly Glu Val Glu Ser Lys Pro Glu Glu Lys Val Ala Val Lys 

115 120 125 

Pro Glu Ser Gin Pro Ser Asp Lys Pro Ala Glu Glu Ser Lys Val Glu 

130 135 140 

Pro Pro Val Glu Gin Ala Lys Val Pro Glu Gin Pro Val Gin Pro Thr 
145 '150 155 160 

Gin Ala Glu Gin Pro Ser Thr Pro Lys Glu Ser Ser Gin Gin Glu Asn 

165 170 175 

Pro Lys Glu Asp Arg Gly Ala Glu Glu Thr Pro Lys Gin Glu Asp Glu 

180 185 190 

Gin Pro Ala Glu Ala Gin Glu lie Lys Val Glu Glu Pro Val Glu Ser 

195 200 205 

Lys Glu Glu Thr Val Asn Gin Pro Val Glu Gin Pro Lys Val Glu Thr 

210 215 220 

Pro Ala Val Glu Lys Gin Thr Glu Pro Thr Glu Glu Pro Lys Val Glu 
225 230 235 240 

Val Thr Ser lie Pro Gin Thr Thr Arg Tyr Glu Glu Asp Leu Thr Lys 

245 250 255 

Glu His Gly Thr Arg Glu Val Val Lys Glu Gly Lys Asn Gly Ser Arg 

260 265 270 

Thr Val Thr Thr Pro Tyr lie Leu Asn Ala Thr Asp Gly Thr Thr Thr 

275 280 285 

Glu Gly Thr Ser Thr Thr Asp Glu Ala Glu Met Glu Lys Glu Val Val 

290 295 300 

Arg Val Gly Thr Lys Pro Lys Glu Lys Leu Ala Pro Val Leu Ser Leu 
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305 

Thr Ser Val Thr Asp 
325 

His Leu Glu Asn Thr 
340 

lie Lys Asn Gly Asp 
355 

Arg Leu Ser Asp Ala 
370 

lie Val Thr Ser Met 
385 

Thr Leu Glu Glu Thr 
405 

Lys Asn lie Gly Ser 
420 

Glu Val Ala Ser Asp 
435 

Tyr Tyr Leu Lys Val 
450 

Ser 
465 



310 

Asn Ala Met Leu Arg 
330 

Asp Ser Val Asp Val 
345 

Lys Val Val Lys Thr 
360 

Val Asp Gly Leu Glu 
375 

Thr Tyr Asp Arg Gly 
390 

Pro Leu Arg Leu Asp 
410 

Thr Asn Leu Val Lys 
425 

Phe Leu Thr Ser Lys 
440 

Thr Ser Arg Asp Asn 
455 



315 320 
Ser Ala Arg Leu Thr Tyr 
335 

Lys Lys lie His Ala Glu 
350 

lie Asp Leu Ser Lys Glu 
365 

Leu Tyr Lys Asp Tyr Lys 
380 

Asn Gly Glu Glu Thr Ser 
395 400 
Leu Lys Lys Val Glu Leu 
415 

Val Asn Glu Asp Gly Thr 
430 

Pro Val Asp Val Gin Asn 
445 

Lys Val Val Ser Pro Pro 
460 



(2) INFORMATION FOR SEQ ID NO: 165: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 152 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 



Val Gin Leu Tyr Lys Ala Trp Ser Glu lie Gly Ser Val Val His Thr 

15 10 15 

His Ser Thr Glu Ala Val Ala Trp Ala Gin Ala Gly Arg Asp He Pro 

20 25 30 

Phe Tyr Gly Thr Thr His Ala Asp Tyr Phe Tyr Gly Ser He Pro Cys 

35 40 45 

Ala Arg Ser Leu Thr Lys Asp Glu Val Glu Val Ala Tyr Glu Lys Asp 

50 55 60 

Thr Gly Leu Val lie Val Glu Glu Phe Glu His Arg Gly Leu Asn Pro 
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65 70 75 80 

Val Glu Val Pro Gly lie Val Val Arg Asn His Gly Pro Phe Thr Trp 

85 90 95 

Gly Lys Asn Pro Glu Asn Ala Val Tyr His Ser Val Val Leu Glu Glu 

100 105 110 

Val Ser Lys Met Asn Arg Phe Thr Glu Gin lie Asn Pro Arg Val Glu 

115 120 125 

Pro Ala Pro Gin Tyr lie Leu Glu Lys His Tyr Gin Arg Lys His Gly 

130 135 140 

Pro Asn Ala Tyr Tyr Gly Gin Lys 
145 150 

(2) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 

Val Val Lys Ala lie Gin Asp Gly Lys Ala Lys Leu Val Phe Leu Ala 

15 10 15 

His Asp Ala Gly Pro Asn Leu Thr Lys Lys lie Gin Asp Lys Ser His 

20 25 30 

Tyr Tyr Gin Val Glu lie Val Thr Val Phe Ser Thr Leu Glu Leu lie 

35 40 45 

lie Ala Val Gly Lys Ser Arg Lys Val Leu Ala Val Thr Asp Ala Gly 

50 55 60 

Phe Thr Lys Lys Met Arg Ser Leu Met Glu 
65 70 

(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 190 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE 



TYPE : None 



(xi) SEQUENCE 



DESCRIPTION: SEQ ID NO: 167: 



Val Ala Asp Asp Asp Gin Cys lie Phe Leu Cys His Asn His Arg Ala 

15 10 15 

Gin Glu Ser lie Glu Phe Glu Lys Met lie Asp Gin Leu Ser Lys Tyr 

20 25 30 

Tyr Ser Cys Arg lie Leu Thr Glu Lys Asp lie Pro Ser lie Leu Ser 

35 40 45 

Leu Tyr Glu Ser Asn Pro Leu Tyr Phe Gin His Cys Pro Pro Glu Pro 

50 55 60 

Asn Phe Ala Thr Val Lys Glu Asp Met Leu Cys Leu Pro Glu Gly Lys 
65 70 75 80 

Ala Lys Ala Asp Lys Phe Phe Val Gly Phe Trp Asn Gly Phe Asp Leu 

85 90 95 

Val Ala Val Met Asp Phe Val Tyr Ala Tyr Pro Asp Glu Glu Thr Val 

100 105 110 

Phe lie Gly Leu Phe Met Val Asp Gin Ala Tyr Gin Arg Lys Gly lie 

115 120 125 

Gly Ser His lie Val Thr Glu Ala Leu Ala Tyr Phe Ala Lys Asn Phe 

130 135 140 

Arg Lys Ala Arg Leu Ala Tyr Val Lys Gly Asn Pro Gin Ser Gin His 
145 150 155 160 

Phe Trp Glu Lys Gin Gly Phe Lys Ser lie Gly Cys Glu Val Lys Gin 

165 170 175 

Glu Leu Tyr Thr Val Val lie Val Glu Gin Ser Leu Glu Asp 



(2) INFORMATION FOR SEQ ID NO: 16 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: 

Val Ala Leu Thr Pro Leu Leu Lys Glu -Glu Gly Val Ala Asp lie Pro 
1 5 io 15 



180 



185 



190 
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Ala Tyr Lys Asp Tyr Tyr Val Pro Met Asn Lys Ala Leu Trp Lys Asp 

20 25 30 

Leu Glu Leu Lys Lys He Ser Lys Gin Glu Leu Val Asn Thr Arg Phe 

35 40 45 

Ser Arg Leu Phe Ala His Phe Gly Gin Glu Lys Asp Gly Ser Phe Leu 

50 55 60 

Ala Gin Arg Tyr Gin Phe Tyr Leu Ala Gin Gin Gly Gin Thr Leu Ser 
65 70 75 80 

Gly Ala His Asp Leu Leu Asp Ser Leu He Glu Arg Asp Tyr Asn Leu 

85 90 95 

Tyr Ala Ala Thr Asn Gly lie Thr Ala He Gin Thr Gly Arg Leu Ala 

100 105 HO 

Gin Ser Gly Leu Ala Pro Tyr Phe Asn Gin Val Phe He Ser Glu Gin 

115 120 125 

Leu Gin Thr Gin Lys Pro Asp Ala Leu Phe Tyr Glu Lys He Gly Gin 

130 135 140 

Gin He Ala Gly Phe Ser Lys Glu Lys Thr Leu Met He Gly Asp Ser 
145 150 155 160 

Leu Thr Ala Asp He Gin Gly Gly Asn Asn Ala Gly He Asp Thr He 

165 170 175 

Trp Tyr Asn Pro His His Leu Glu Asn His Thr Gin Ala Gin Pro Thr 

180 185 190 

Tyr Glu Val Tyr Ser Tyr Gin Asp Leu Leu Asp Cys Leu Asp Lys Asn 

195 200 205 

He Leu Glu Lys He Thr Phe 
210 215 

(2) INFORMATION FOR SEQ ID NO: 169: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 299 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: 
Val Ala Ala Leu Ser Gin Gin Asp Val Pro Lys Ala Leu Ser Cys Leu 



Asn Leu Leu Phe Asp Asn Gly Lys Ser -Met Thr Arg Phe Val Thr Asp 



1 



5 



10 



15 



20 



25 



30 
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Leu Leu His Tyr Leu Arg Asp Leu Leu lie Val Gin Thr Gly Gly Glu 

35 40 45 

Asn Thr His His Ser Ser- Val Phe Val Glu Asn Leu Ala Leu Pro Gin 

50 55 60 

Lys Asn Leu Phe Glu Met lie Arg Leu Ala Thr Val Asn Leu Ala Asp 
65 70 75 80 

lie Lys Ser Ser Leu Gin Pro Lys lie Tyr Ala Glu Met Met Thr Val 

85 90 95 

Arg Leu Ala Glu lie Lys Pro Glu Pro Ala Leu Ser Gly Ala Val Glu 

100 105 HO 

Asn Arg He Ala Thr Leu Arg Gin Glu Val Ala Arg Leu Lys Gin Glu 

115 120 125 

Leu Ser Asn Ala Gly Ala Val Pro Lys Gin Val Ala Pro Ala Pro Ser 

130 135 140 

Arg Pro Ala Thr Gly Lys Thr Val Tyr Arg Val Asp Arg Asn Lys Val 
145 150 155 160 

Gin Ser He Leu Gin Glu Ala Val Glu Asn Pro Asp Leu Ala Arg Gin 

165 170 175 

Asn Leu He Arg Leu Gin Asn Ala Trp Gly Glu Val He Glu Ser Leu 

180 185 190 

Gly Gly Pro Asp Lys Ala Leu Leu Val Gly Ser Gin Pro Val Ala Ala 

195 200 205 

Asn Glu His His Ala He Leu Ala Phe Glu Ser Asn Phe Asn Ala Gly 

210 215 220 

Gin Thr Met Lys Arg Asp Asn Leu Asn Thr Met Phe Gly Asn He Leu 
225 230 235 240 

Ser Gin Ala Ala Gly Phe Ser Pro Glu He Leu Ala He Ser Met Glu 

245 250 255 

Glu Trp Lys Glu Val Arg Ala Ala Phe Ser Ala Lys Ala Lys Ser Ser 

260 265 270 

Gin Thr Glu Lys Glu Val Glu Glu Ser Leu He Pro Glu Gly Phe Glu 

275 280 285 

Phe Leu Ala Asp Lys Val Lys Val Glu Glu Asp 
290 295 



(2) INFORMATION FOR SEQ ID NO: 17 0: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 147 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE 

(xi) SEQUENCE 

Val Pro Leu Val lie 

1 5 
His Gin Val Met His 
20 

Leu Val Ala Gly Lys 
35 

Lys His Asn Ala Asn 
50 

Ala Tyr Phe Tyr Ser 
65 

Phe Glu Ser Ala Gly 
85 

Phe Glu Glu Lys Met 
100 

Leu Asp Leu Gin Ala 
115 

Val Gin Val Pro Leu 
130 

Ala Ser Arg 
145 

(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 73 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 

Val Thr Glu Asn Ala Glu Ala Ala Ala Tyr Phe Thr Asp Gin Val Asp 

15 10 15 

Ser Ala Ala Val Tyr Val Asn Ala Ser Thr Arg Phe Thr Asp Gly Gly 

20 25 30 

Gin Phe Gly Leu Gly Cys Glu Met Gly lie Ser Thr Gin Lys Leu His 
35 40 45 
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TYPE: None 

DESCRIPTION: SEQ ID 

Leu Met lie Gly Met 
10 

Trp Gly Thr Phe Leu 
25 

Pro Tyr lie Gin Ser 
40 

Met Asp Thr Leu Val 
55 

Leu Val Ala Leu Phe 
70 

Phe lie Leu Phe Phe 
90 

Arg Lys Asn Thr Ser 
105 

Lys Thr Ala Glu Val 
120 

Glu Gin Val Lys Val 
135 



NO:170: 

Leu Ala Gly Ser lie Ser 
15 

Ala Thr Thr Pro lie Met 
30 

Ala Trp Ala Ser Phe Lys 
45 

Ala Leu Gly Thr Leu Val 
60 

Ala Gly Leu Pro Val Tyr 
75 80 
Val Leu Leu Gly Ala Val 
95 

Gin Ala Val Glu Lys Leu 
110 

Leu Ser Asp Asp Ser Tyr 
125 

Arg Asp Leu Asp Ser Ser 
140 
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Ala Arg Gly Pro Met Gly Leu Lys Glu Leu Thr Ser Tyr Lys Tyr Val 

50 55 60 

Val Ala Gly Asp Gly Gin He Arg Glu 
65 70 

(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 

Val Asp Leu Pro Gin Gin Phe His Leu Gly Ser He Thr Lys Thr Phe 

15 10 15 

Gin Trp Leu Val Asp lie Asn Asn Leu Val Phe Lys Gly Ser He Pro 

20 25 30 

lie Val Ser Leu Leu Phe lie Tyr Cys Leu Gly Val Asn He Ala Lys 

35 40 45 

He Tyr Lys Val Asp Thr Val Ser Ala Gly Leu Val Ser Leu Ala Ser 

50 55 60 

Phe Val He Ser He Gly Ser Thr Val Thr Lys Ser Phe Pro Leu Ala 
65 70 75 80 

Asn Val Gly Asp Val Lys Leu Asp Gin He Leu Thr Trp Asn 
85 90 

(2) INFORMATION FOR SEQ ID NO:173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:173: 
Val Ser Leu Arg Leu He Tyr Ser He Phe Lys Lys Met Arg Lys Asn 
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1 



5 



10 



15 



Met Lys lie Ser His Met Lys Lys Asp Glu Leu Phe Glu Gly Phe Tyr 

20 25 30 

Leu lie Lys Ser Ala Asp Leu Arg Gin Thr Arg Ala Gly Lys Asn Tyr 

35 40 45 

Leu Ala Phe Thr Phe Gin Asp Asp Ser Gly Glu lie Asp Gly Lys Leu 

50 55 60 

Trp Asp Ala Gin Pro His Asn lie Glu Ala Phe Thr Ala Gly Lys Val 
65 70 75 80 

Val His Met Lys Gly Arg Arg Glu Val Tyr Asn Asn Thr Pro Gin Val 

85 90 95 

Asn Gin lie Thr Leu Arg Leu Pro Gin Ala Gly Glu Pro Asn Asp Pro 

100 105 110 

Ala Asp Phe Lys Val Lys Ser Pro Val Asp Val Lys Glu lie Arg Asp 

115 120 125 

Tyr Met Ser Gin Met lie Phe Lys He Glu Asn Pro Val Trp Gin Arg 

130 135 140 

He Val Arg Asn Leu Tyr Thr Lys Tyr Asp Lys Glu Phe Tyr Ser Tyr 
145 150 155 160 

Pro Ala Ala Lys Thr Asn His His Ala Phe Glu Thr Gly Leu Ala Tyr 

165 170 175 

His Thr Ala Thr Met Val Arg Leu Ala Asp Ala He Ser Glu Val Tyr 

180 185 190 

Pro Gin Leu Asn Lys Ser Leu Leu Tyr Ala Gly lie Met Leu His Asp 

195 200 205 

Leu Ala Lys Val He Glu Leu Thr Gly Pro Asp Gin Thr Glu Tyr Thr 

210 215 220 

Val Arg Gly Asn Leu Leu Gly His He Ala Leu He Asp Ser Glu He 
225 230 235 240 

Thr Lys Thr Val Met Glu Leu Gly He Asp Asp Thr Lys Glu Glu Val 

245 250 255 

Val Leu Leu Arg His Val He Leu Lys Ser Thr Thr Ala Cys Leu Asn 

260 265 270 

Met Glu He Pro Val Arg Pro Arg He Met Glu Ala Glu He He His 

275 280 285 

Met He Asp Asn Leu Asp Ala Ser Met Met Met Met Ser Thr Ala Leu 

290 295 300 

Ala Leu Val Asp Lys Gly Glu Met Thr Asn Lys He Phe Ala Met Asp 
305 310 315 320 

Asn Arg Ser Phe Tyr Lys Pro Asp Leu Asp 



325 



330 



(2) INFORMATION FOR SEQ ID NO: 17 4: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: 



Val Trp Lys Lys 

1 

Thr lie Ala Ala 
20 

Gin Val Ala His 
35 

Ala Phe Ala Met 
50 

Gly Glu Gin Val 
65 

Thr Gly Leu Val 

Leu Phe Pro Ser 
100 

Glu Val Ala Thr 
115 

Glu Lys Pro Glu 
130 



Lys Lys Val Lys 
5 

lie Phe Ser Leu 

Tyr Gin Asp Tyr 
40 

Ala Lys Arg Thr 
55 

Phe Asn Leu Gly 
70 

Thr Arg Val Arg 
85 

Val Lys lie Lys 

Asp Ser Ser Glu 
120 

Lys Lys Glu Asn 
135 



Ala Gly Val Leu 
10 

Leu Leu Gin Phe 
25 

Ala Leu Asn Lys 

Lys Asp Lys Val 
60 

Gin Val Ser Tyr 
75 

Thr Asp Lys Ser 
90 

Glu Glu Lys Arg 
105 

Lys Val Glu Lys 
Ser 



Leu Tyr Ala Val 
15 

Tyr Leu Asn Arg 
30 

Glu Lys Leu Val 
45 

Glu Gin Glu Ser 

Gin Asn Lys Lys 
80 

Gin Tyr Glu Phe 
95 

Asp Lys Lys Glu 
110 

Lys Lys Ser Glu 
125 



(2) INFORMATION FOR SEQ ID NO: 17 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 163 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 
Val Asp Gly Lys Phe Gly Lys His Val Glu Gin lie Pro Glu Gly Ala 
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15 10 15 

Glu Val lie Asp Tyr Thr Gly Tyr Ser He Ala Pro Gly Leu Val Asp 

20 25 30 

Thr His He His Gly Tyr Ala Gly Val Asp Val Met Asp Asn Asn He 

35 40 45 

Glu Gly Thr Leu His Thr Met Ser Glu Gly Leu Leu Ser Thr Gly Val 

50 55 60 

Thr Ser Phe Leu Pro Thr Thr Leu Thr Ala Thr Tyr Glu Gin Leu Leu 
65 70 75 80 

Ala Val Thr Glu Asn Leu Gly Asn His Tyr Lys Glu Ala Thr Gly Ala 

85 90 95 

Lys He Arg Gly He Tyr Tyr Glu Gly Pro Tyr Phe Thr Glu Thr Phe 

100 105 110 

Lys Gly Ala Gin Asn Pro Thr Tyr Met Arg Asp Pro Gly Val Glu Glu 

115 120 125 

Phe His Ser Trp Gin Lys Ala Ala Asn Gly Leu Leu Asn Lys He Arg 

130 135 140 

Leu His Gin Asn Val Met Gly Trp Lys Thr Leu Phe Val Gin Leu Arg 
145 150 155 160 

Ala Lys Val 



(2) INFORMATION FOR SEQ ID NO: 17 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 6: 

Val Arg Arg He Glu Glu Lys Cys Lys Leu He Ala Gin Leu Asp Thr 

15 10 15 

Lys Thr Val Tyr Ser Phe Met Glu Ser Val He Ser He Glu Lys Tyr 

20 25 30 

Val Arg Ala Ala Lys Glu Tyr Gly Tyr Thr His Leu Ala Met Met Asp 

35 40 45 

He Asp Asn Leu Tyr Gly Ala Phe Asp Phe Leu Glu He Thr Lys Lys 

50 55 60 

Tyr Gly He His Pro Leu Leu Gly Leu Glu Met Thr Val Phe Val Asp 
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65 70 75 80 

Asp Gin Glu Val Asn Leu Arg Phe Leu Ala Leu Ser Ser Val Gly Tyr 

85 90 95 

Gin Gin Leu Met Lys Leu Ser Thr Ala Lys Met Gin Gly Glu Lys Thr 

100 105 110 

Trp Ser Val Leu Ser Gin Tyr Leu Glu Asp lie Ala Val lie Val Pro 

115 120 125 

Tyr Phe Asp Arg Val Glu Ser Leu Glu Leu Gly Cys Asp Tyr Tyr lie 

130 135 140 

Gly Val Tyr Pro Glu Thr Leu Ala Ser Glu Phe His His Pro lie Leu 
145 150 155 160 

Pro Leu Tyr Arg Val Asn Ala Phe Glu Ser Arg Asp Arg Glu Val Leu 

165 170 175 

Gin Val Leu Thr Ala lie Lys Glu Asn Leu Pro Leu Arg Glu Val Pro 

180 185 190 

Leu Arg Ser Arg Gin Asp Val Phe lie Ser Ala Ser Ser Leu Glu Lys 

195 200 205 

Leu Phe Gin Glu Arg Phe Pro Ala Ser Phe Gly Gin Phe Arg Lys Ala 

210 215 220 

Tyr Phe Arg His Phe Leu Arg Leu Gly Tyr 
225 230 

(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 7: 

Val Val Glu Arg lie Lys lie Ala Arg Ser Tyr Gly Asp Leu Ser Glu 

15 10 15 

Asn Ser Glu Tyr Glu Ala Ala Lys Asp Glu Gin Ala Phe Val Glu Gly 

20 25 30 

Gin lie Ser. Ser Leu Glu Thr Lys lie Arg Tyr Ala Glu lie Val Asn 

35 40 45 

Ser Asp Ala Val Ala Gin Asp Glu Val Ala lie Gly Lys Thr Val Thr 

50 55 60 

lie Gin Glu lie Gly Glu Asp Glu Glu Glu Val Tyr lie lie Val Gly 
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65 70 75 80 

Ser Ala Gly Ala Asp Ala Phe Ala Gly Lys Val Ser Asn Glu Ser Pro 

85 90 95 

lie Gly Gin Ala Leu lie Gly Lys Lys Thr Gly Asp Thr Ala Thr lie 

100 105 110 

Glu Thr Pro Val Gly Ser Tyr Asp Val Lys lie Leu Lys Val Glu Lys 
115 120 125 

Thr Ala 
130 

(2) INFORMATION FOR SEQ ID NO: 17 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: 

Val Asp Phe lie Gly Gly Leu Ser Ala Leu Glu Gin Lys Gly Tyr Gin 

15 10 15 

Lys Gly Asp Glu lie Leu lie Asn Ser lie Pro Arg Ala Leu Thr Glu 

20 25 30 

Thr Asp Lys Val Cys Ser Ser Val Asn lie Gly Ser Thr Lys Ser Gly 

35 40 45 

lie Asn Met Thr Ala Val Ala Asp Met Gly Arg lie Tyr Gin Gly Asn 

50 55 60 

Gly Lys Ser Phe Arg Tyr Gly Ser Gly Gin Val Gly Cys lie Arg 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 17 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: 

Val Val Thr Pro Ala Asn Tyr Asn Thr Pro Ala Gin lie Val lie Ala 

15 10 15 

Gly Glu Val Val Ala Val Asp Arg Ala Val Glu Leu Leu Gin Glu Ala 

20 25 30 

Gly Ala Lys Arg Leu He Pro Leu Lys Val Ser Gly Pro Phe His Thr 

35 40 45 

Ala Leu Leu Glu Pro Ala Ser Gin Lys Leu Ala Glu Thr Leu Ala Gin 

50 55 60 

Val Ser Phe Ser Asp Phe Thr Cys Pro Leu Val Gly Asn Thr Glu Ala 
65 70 75 80 

Ala Val Met Gin Lys Glu Asp He Ala Gin Leu Leu Thr Arg Gin Val 

85 90 95 

Lys Glu Pro Val Arg Phe Tyr Glu Ser He Gly Val Met Gin Glu Ala 

100 105 110 

Gly He Ser Asn Phe He Arg Asp Trp Thr Gly Glu Ser Leu Val Arg 
115 120 125 

Phe Cys 
130 

(2) INFORMATION FOR SEQ ID NO: 18 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 0: 

Val His Pro Thr Gly Pro Thr Pro Ala Thr Glu Thr Val Asp Ser He 

15 10 15 

Pro Gly Phe Glu Ala Pro Gin Glu Ser Val Thr He Leu 
20 25 

(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 104 amino acids" 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181: 



Val Pro Thr Val 
1 

Asn Arg Tyr Gin 
20 

Arg Thr Ser Leu 
35 

Arg Thr Ser Asp 
50 

Pro Asp Gly Ala 
65 

Val Gin Ala lie 

Gin Gly Leu Leu 
100 



Phe His Lys Ser 
5 

Pro Asp Phe Val 

Thr Pro Glu Arg 
40 

Asn Glu Asp Asn 
55 

Ser Ala Tyr Phe 
70 

Lys Lys Lys Asp 
85 

Ser Ala Ala lie 



Ala Gin Val Leu 
10 

Leu Cys lie Gly 
25 

Val Ala lie Asn 

Gin Pro lie Asp 
60 

Ser Ser Leu Pro 
75 

Tyr Arg Pro Leu 
90 



Glu Glu Glu Met 
15 

Gin Ala Gly Gly 
30 

Gin Asp Asp Ala 
45 

Arg Pro lie Arg 

lie Lys Ala Met 
80 

Phe Pro lie Arg 
95 



(2) INFORMATION FOR SEQ ID NO: 182: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 



Val Leu Gin Val Gly Ser Gin Asp 

1 5 

Lys Tyr Thr Ser Val Arg Asp lie 
20 

Glu Tyr Asp Phe Gly Leu Arg Leu 

35 40 

Ser Gin Thr Gly His Gin Ala Leu 

50 55 

Asp Leu Phe Lys Thr Trp Trp Arg 



Tyr Val Phe Val Leu Gin Gin Asp 

10 15 
Leu Ser Asp Thr lie Glu Ala Val 
25 30 
Ser lie Met Leu Gly Gin Val Trp 
45 

Ser Asp Leu lie Lys Ala Glu Arg 
60 

Gin Gly His Gin Gly Val His Thr 
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65 70 75 80 

Phe Ser Gin Leu Tyr Leu Trp Ser Leu Gly Glu Arg Leu Val Asp Leu 

85 90 95 

Lys Pro lie Lys Glu Cys Leu His Gin Met lie Leu Asp Gin Asp Gin 

100 105 110 

lie Gin Glu He He Leu Ser Leu Trp Glu Asn Ser Ala Val Leu Thr 
115 120 125 

(2) INFORMATION FOR SEQ ID NO: 183: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 214 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:183: 



Val Arg Arg Ser Asp Arg Tyr Ala Arg Glu Val Gly Ala Asp Cys Val 

1 5 10 15 

Gly Glu Phe Val Ser Ala Thr Lys Thr Tyr Pro Val Ser Phe He Asn 

20 25 30 

Tyr Lys Gly Glu Glu Val Cys Leu Asp Gin Ala Pro Ala Gly Ser Ala 

35 40 45 

Pro Ala Ala Gin Phe Met Asp Gly Leu lie Gly Tyr Gly Val Glu Gin 

50 55 60 

Leu He Ser Thr Gly Thr Cys Gly Val Leu Ala Asp He Glu Glu Asn 
65 70 75 80 

Ala Phe Leu Val Pro Val Arg Ala Leu Arg Asp Glu Gly Ala Ser Tyr 

85 90 95 

His Tyr Val Ala Pro Cys Arg Tyr Met Glu Met Gin Pro Glu Ala He 

100 105 110 

Ala Ala He Glu Glu Val Leu Glu Asp Arg Gly He Pro Tyr Glu Glu 

115 120 125 

Val Met Thr Trp Thr Thr Asp Gly Phe Tyr Arg Glu Thr Ala Glu Lys 

130 135 140 

Val Ala Tyr Arg Lys Glu Glu Gly Cys Ala Val Val Glu Met Glu Cys 
145 150 155 160 

Ser Ala Leu Ala Ala Val Ala Gin Leu Arg Gly Val Leu Trp Gly Glu 

165 170 175 

Leu Leu Phe Thr Ala Asn Ser Leu Ala Asp Leu Asp Gin Tyr Asn Ser 
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180 185 190 

Arg Asp Trp Gly Ser Glu Pro Phe Asn Lys Ala Leu Lys Leu Ser Leu 

195 200 205 

Ala Ser Val His His Leu 
210 



(2) INFORMATION FOR SEQ ID NO: 184: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: 



Val Glu Asn Leu 

1 

Arg Pro Arg Leu 
20 

Val Leu Val Phe 
35 

Asp Asn Gly Thr 
50 

Asn Gly Gin Gly 
65 

Gly Phe Asn Asn 

Glu Gly Trp Thr 
100 

Lys Gly Lys Leu 
115 

Lys Gin Gly Val 
130 



Thr Asn Phe Tyr 
5 

Glu Leu Leu Ala 

Phe Leu Asn lie 
40 

lie Val Tyr Asp 
55 

Thr lie Thr Phe 
70 

Gly Ala Phe Asn 
85 

Tyr Glu Gly Asp 

Thr Thr Glu Gin 
120 

Phe Gin Gin Lys 
135 



Glu Lys Tyr Arg 
10 

Val Val Thr lie 
25 

Pro Gly Lys Gly 

Gly Ser Leu Val 
60 

Gin Asn Gly Asp 
75 

Gly Lys Gly Thr 
90 

Phe Val Asn Gly 
105 

Glu Val Val Tyr 



Val Tyr Leu Thr 
15 

Val Leu Xaa Ala 
30 

Val Leu Lys Leu 
45 

Arg Gly Lys Met 

Gin Tyr Thr Gly 
80 

Phe Gin Ser Lys 
95 

Gin Ala Glu Gly 
110 

Glu Gly Thr Phe 
125 



(2) INFORMATION FOR SEQ ID NO: 185: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY; linear 
(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 

Val Phe Leu Lys Glu Ser Cys Gly Ser Gly Ala Gin lie Ala Glu Thr 

15 10 15 

Phe His Gin Phe Gly Gly Asp Tyr Gly Phe Glu Thr Thr Asp Leu Asn 

20 25 30 

Phe Asn Phe Ala Thr Leu Arg Arg Asn Arg Glu Ala Tyr lie Asp Arg 

35 40 45 

Ala Arg Ser Ser Leu 
50 
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What is claimed is 

1. An isolated polynucleotide comprising a polynucleotide sequence selected 
from the group consisting of: 

(a) a polynucleotide having at least a 70% identity to a polynucleotide encoding 
a polypeptide comprising an amino acid sequence of Table 1; 

(b) a polynucleotide having at least a 70% identity to a polynucleotide encoding 
a mature polypeptide expressed by the gene contained in the S. pneumoniae of the deposited 
strain that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 70% identical to an amino acid sequence of Table 1 ; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide 
of (a), (b), (c) or (d). 

2. The polynucleotide of Claim 1 wherein the polynucleotide is DNA. 

3. The polynucleotide of Claim 1 wherein the polynucleotide is RNA. 

4. The polynucleotide of Claim 2 comprising the nucleic acid sequence selected 
from the group consisting of the nucleic acid sequences set forth in Table 1 . 

5. The polynucleotide of Claim 2 which encodes a polypeptide comprising an 
amino acid sequence sequence selected from the group consisting of the amino acid sequences 
set forth in Table 1 . 

6. A vector comprising the polynucleotide of Claim 1 . 

7. A host cell comprising the vector of Claim 6. 

8. A process for producing a polypeptide comprising: expressing from the host 
cell of Claim 7 a polypeptide encoded by said DNA. 

9. A process for producing a polypeptide or fragment comprising culturing a 
host of claim 7 under conditions sufficient for the production of said polypeptide or 
fragment. 

10. A polypeptide comprising an amino acid sequence which is at least 70% 
identical to an amino acid sequence selected from the group consisting of the amino acid 
sequences set forth in Table 1. 

11. A polypeptide comprising an amino acid sequence selected from the group 
consisting of the amino acid sequences set forth in Table 1 . 
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12. An antibody against the polypeptide of claim 10. 

13. An antagonist or agonist of the activity or expression of the polypeptide of 
claim 10. 

14. A method for the treatment or prevention of disease of an individual 
comprising: administering to the individual a therapeutically effective amount of the 
polypeptide of claim 10. 

15. A method for the treatment of an individual having need to inhibit a bacterial 
polypeptide comprising: administering to the individual a therapeutically effective amount of 
the antagonist of Claim 13. 

16. A process for diagnosing a disease related to expression or activity of the 
polypeptide of claim 10 in an individual comprising: 

(a) determining a nucleic acid sequence encoding said polypeptide, and/or 

(b) analyzing for the presence or amount of said polypeptide in a sample derived 
from the individual. 

17. A method for identifying compounds which interact with and inhibit or 
activate an activity of the polypeptide of claim 10 comprising: 

contacting a composition comprising the polypeptide with the compound to be 
screened under conditions to permit interaction between the compound and the polypeptide to 
assess the interaction of a compound, such interaction being associated with a second 
component capable of providing a detectable signal in response to the interaction of the 
polypeptide with the compound; 

and determining whether the compound interacts with and activates or inhibits an 
activity of the polypeptide by detecting the presence or absence of a signal generated from the 
interaction of the compound with the polypeptide. 

18. A method for inducing an immunological response in a mammal which 
comprises inoculating the mammal with the polypeptide of claim 10, or a fragment or 
variant thereof, adequate to produce antibody and/or T cell immune response to protect said 
animal from disease. 

19. A method of inducing immunological response in a mammal which comprises 
delivering a nucleic acid vector to direct expression of a polypeptide of claim 10, or 
fragment or a variant thereof, for expressing said polypeptide, or a fragment or a variant 
thereof in vivo in order to induce an immunological response to produce antibody and/ or T 
cell immune response to protect said animal from disease. 
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20. A polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of the the first ten polynucleotides sequences from the top of Table 1 . 

21. A polypeptide comprising a polypeptide encoded by the polynculeotide of 
claim 20. 

22. The isolated polynucleotide of claim 1 wherein said nucleotide is selected 
from the group consisting of: 

(a) a polynucleotide having at least a 90% identity to a polynucleotide encoding 
a polypeptide comprising the amino acid sequence of Table 1; 

(b) a polynucleotide having at least a 90% identity to a polynucleotide encoding 
the same mature polypeptide expressed by the gene contained in the S. pneumoniae of the 
deposited strain that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 90% identical to the amino acid sequence of Table 1 ; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide 
of (a), (b), (c) or(d). 

23. The isolated polynucleotide of claim 1 selected from the group consisting of: 

(a) a polynucleotide having at least a 95% identity to a polynucleotide encoding 
a polypeptide comprising the amino acid sequence of Table 1; 

(b) a polynucleotide having at least a 95% identity to a polynucleotide encoding 
the same mature polypeptide expressed by the gene contained in the S. pneumoniae of the 
deposited strain that was sequenced to obtain a polynucleotide sequence of Table 1 ; 

(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 95% identical to the amino acid sequence of Table 1; 

(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or 

(c); and 

(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide 
of (a), (b), (c) or (d). 

24. An isolated polynucleotide comprising a polynucleotide sequence selected 
from the group consisting of: 
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(a) a polynucleotide having at least a 50% identity to a polynucleotide encoding 
a polypeptide comprising the amino acid sequence of Table 1 and obtained from a prokaryotic 
species other than S. pneumoniae; 

(b) a polynucleotide encoding a polypeptide comprising an amino acid sequence 
which is at least 50% identical to the amino acid sequence of Table 1 and obtained from a 
prokaryotic species other than S. pneumoniae', and 

(c) a polynucleotide which is complementary to the polynucleotide of (a) or (b). 

25. An isolated Streptococcal polypeptide having one of the amino acid 
sequences given in Table 1 . 

26. An isolated nucleic acid encoding one of the amino acid sequences of 
Claim 1 and nucleic acid sequences capable of hybridizing therewith under stringent 
conditions. 

27. Recombinant vectors comprising the nucleic acid sequences of 
Claim 26 and host cells transformed or transfected therewith. 

28. A method of identifying an antimicrobial compound comprising contacting 
candidate compounds with a polypeptide of Claim 1 and selecting those compounds 
capable of inhibiting the bioactivity of said polypeptide. 

29. Antimicrobial compounds identified by the method of Claim 28. 

30. An isolated Streptococcal polypeptide having one of the amino acid 
sequences given in Table 1 . 

31 . An isolated nucleic acid encoding one of the amino acid sequences of 
Claim 30 and nucleic acid sequences capable of hybridizing therewith under stringent 
conditions. 

32. Recombinant vectors comprising the nucleic acid sequences of 
Claim 31 and host cells transformed or transfected therewith. 

33. A method of identifying an antimicrobial compound comprising contacting 
candidate compounds with a polypeptide of Claim 30 and selecting those compounds 
capable of inhibiting the bioactivity of said polypeptide. 

34. Antimicrobial compounds identified by the method of Claim 33. 
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